Mackerel blog #mackerelio

The Official Blog of Mackerel

Announcement concerning specification adjustments to the average value calculation method of metric monitoring

Mackerel Sub Producer id:Songmu here. As stated in the title, this is an announcement concerning adjustments made to the specifications of the average value calculation method for metric monitoring.

Overview

The average value calculation method for metric monitoring will be uniformly changed to the average value of "the number of points".

This will affect host metric monitoring as well as response time monitoring of URL external monitoring. The average values for these metrics are currently being calculated for "N minutes". Service metrics will not be affected.

Potentially affected users

As these adjustments are minor, there generally won’t be any big effects.

There is a possibility that users using host metric monitoring or external response time monitoring, furthermore, those with an average value monitor set for 2 minutes or more may be affected.

In particular, the likelihood of impact is higher for metrics posted at intervals other than 1 minute. For example, there are some AWS integration metrics posted at 5 minute intervals.

Date of change

February 8th 2018 (Thursday)

A concrete comparison of the current and new specifications

The average value calculation target for the following metric monitoring will change as shown below.

Current status New specifications
Host metric monitoring N minutes N points
Response time monitoring N minutes N points
Service metric monitoring N points N points

Differences between "N minutes" and "N points"

The average value for metrics targeted in “N minutes” is calculated on the premise that data be posted every minute. In some cases, this behavior is not intuitive, such as the following.

  • When data is posted at an interval other than 1 minute
    • For example, at an interval of 5 minutes, etc.
  • When metric data is temporary missing
    • For example, in plugins that performs diff calculation for the counter value, metrics are not calculated when the counter decreases

In these cases, when the average value monitor is set for 2 minutes or more, the average value is not calculated for the 1 minute of the value not being measured (null).

In other words, you can’t realistically configure an average value monitor of 2 minutes or more for 5 min interval host metrics and monitoring doesn’t work when data is temporarily missing and the average values of before and after can’t be calculated.

The following example specifically illustrates how the average of N minutes and the average of N points are calculated differently when there is missing data.

Time Raw metrics 3 minute average 3 point average
15:00 10 - -
15:01 11 - -
15:02 12 11 11
15:03 13 12 12
15:04 null null 12
15:05 14 null 13
15:06 15 null 14
15:07 16 15 15

This specification adjustment will unify the average value calculation in N points.

Additional information

With this adjustment, average value monitoring now works with metrics of arbitrary intervals. However, as with current service metrics, only metric points within the last 24 hours are eligible. In other words, if you are posting metrics on a daily basis, monitoring the average value of 2 points or more will not work well.

In cases where the posting interval is not constant, data is not weighted according to the interval and the average value is simply calculated.

Additionally, the shortest metric interval remains unchanged at 1 minute. In other words, it is not currently possible to save multiple metric points between one minute intervals. Even if multiple metrics are posted with a precision of less than 1 minute, they are rounded up to 1 minute and overwritten with the most recently posted value.

As we continue striving to improve our services, we appreciate your understanding and cooperation.