Mackerel blog #mackerelio

The Official Blog of Mackerel

Timed mute can now be set per monitoring rule etc.

Hello! Mackerel team CRE, Inoue (id:a-know) here.

Last week, we ran a booth at the Developers Summit 2018.

We received a ton of visitors and handed out a lot of the new flyers that were specially made for this event. Helping run the booth, I personally got to meet many people face to face and it was a unique experience. Thank you all very much.

Now on to this week’s updates.

Timed mute can now be set per monitoring rule

Even before this update, it was possible to mute notifications for a specified period of time for all monitoring rules, but now, we’ve made it possible to implement this per monitoring rule.

We think that this can be useful for times when you already know that a problem has occurred or when you want to temporarily suppress notifications for issues that are currently being dealt with.

You can now check the attempt log in alert details when max attempts is configured for check monitoring

We announced an equivalent feature update for external monitoring on February 2nd. With this update, this feature can now be used with check monitoring alerts.

This can be particularly useful for monitoring conditions where alerts can occur for multiple reasons. Give it a try!

Specifications for the average value calculation method of metric monitoring have been adjusted & notes can now be configured in check monitoring etc.

Mackerel team CRE Inoue (id:a-know) here. Hello!

"Mackerel Meetup # 11", an official Mackerel event, was held this Monday the 5th. The following is the report that I wrote the day after the event. (Japanese only)

mackerel.io

Even some participants have written blogs about the event. (Japanese only)

blog.lorentzca.me

At the event, we received a lot of helpful comments, impressions, and requests from many different people. Some of them are even already moving quickly within the team. So keep your eye on Mackerel 2018!

Now on to this week’s update information.

Specifications for the average value calculation method of metric monitoring have been adjusted

The other day, an advanced notice was published on this blog announcing the specification adjustment for the average value calculation method of metric monitoring. These adjustments were implemented with this week’s release.

mackerel.io

With this release, average value monitoring is now done using the calculated value based on the number of metric postings. Now, even metrics posted at intervals of 5 minutes that were previously skipped, will be subject to correct average value monitoring.

Notes can now be configured in check monitoring settings

[plugin.checks.ssh]
command = "ruby /path/to/check-ssh.rb"
memo = "This check monitor is ..."

As stated above, it is now possible to configure notes in the check monitoring settings of mackerel-agent. The character string specified here can be checked in alert notifications / the alert details screen / the host details page. Using these notes, you can inform the person receiving the alert about initial operations, monitoring meanings, as well as metric items that should be checked, etc. In order to use this feature, mackerel-agent needs to be updated to the latest version, v 0.52.0.

mackerel-agent now supported with Amazon Linux 2!

mackerel-agent can now be installed on Amazon Linux 2 LTS Candidate (2017.12). Sorry to keep you all waiting!

Updates for Mackerel related OSS

Updates for various Mackerel related OSS were made this week and the details follow below.

mackerel-agent-plugins v0.44.0

  • [aws-elasticsearch] -metric-key-prefix option added
  • mackerel-plugin-aws-s3-requests was newly added
    • The number of requests and latency etc. can now be monitored with S3 bucket with CloudWatch request metrics enabled.

mackerel-check-plugins v0.17.0

  • [ntpoffset] Check processing system was improved
  • [check-tcp] -W option added
    • The error level when an unexpected error occurs can now be fixed at Warning .

Thank you to everyone who contributed Pull Requests!

【2/15 〜 2/16】Mackerel at the 2018 Developers Summit!

Developers Summit 2018, a festival for IT engineers and developers, will be held on Thursday and Friday of next week (February 15th-16th). The Mackerel team’s very own Development Director Kasuya (id:daiksy) and I, Inoue (id:a-know), will be giving presentations at the event. Kasuya will present on the first day (Thursday 2/15) and I will present on the second (Friday, 2/16)!

リモートワークは難しい - それでもぼくらは歯をくいしばってやっていく - Developers Summit 2018 (Japanese only)

「自分」をまるごと活かす!私が“CRE”というキャリアを選んだ理由 - Developers Summit 2018 (Japanese only)

The team will also be running a booth to introduce Mackerel on both days of the event. At the booth, we are preparing a little present give-away, so be sure to come and visit us at the festival!

Graphs can now be shared with people who don’t have accounts etc.

Hello! Mackerel Team Director id:daiksy here.

Here is this week’s update information.

Graphs can now be shared with people who don’t have accounts

Mackerel has a graph sharing feature which allows graphs to be viewed in places other than from Mackerel's Web UI and in various formats such as an image or iframe. However, in order to do this, a Mackerel login account and a login session in the browser was necessary.

With this update, we’ve now released a graph sharing feature that allows anyone who knows the URL to view the graph.

Please be cautious when sharing as the range of disclosure is now wider than ever. Additionally, graphs shared with this feature are assigned new URLs and can be viewed in the shared graphs list.

https://mackerel.io/my?tab=sharedGraphs

You can invalidate the URL of the a shared graph and stop viewer access by deleting the graph from this list.

You can now check the attempt logs in the alert details of external monitoring when max attempts is configured

Up until now, even if the number of maximum attempts was configured, only the log at the moment an alert was opened was displayed in alert details of external monitoring. And because of a mixture of response time deterioration and status code error, it was sometimes difficult to understand why the alert was opened. Therefore, we decided to display all attempt logs, thus making it easier to understand why the alert occurred.

In the screenshot above, you can see that the past 3 minutes of average value monitoring, from 17:44 to 17:46, resulted in an alert condition HIT with 3 consecutive occurrences and shows that the alert opened at 17:46.

We are planning on applying similar improvements for check monitoring as well.

Announcing specification adjustments to the average value calculation method of metric monitoring

Next week we will be adjusting specifications of the average value calculation method for metric monitoring.

For more detailed information, please check out the following announcement.

mackerel.io

Announcement concerning specification adjustments to the average value calculation method of metric monitoring

Mackerel Sub Producer id:Songmu here. As stated in the title, this is an announcement concerning adjustments made to the specifications of the average value calculation method for metric monitoring.

Overview

The average value calculation method for metric monitoring will be uniformly changed to the average value of "the number of points".

This will affect host metric monitoring as well as response time monitoring of URL external monitoring. The average values for these metrics are currently being calculated for "N minutes". Service metrics will not be affected.

Potentially affected users

As these adjustments are minor, there generally won’t be any big effects.

There is a possibility that users using host metric monitoring or external response time monitoring, furthermore, those with an average value monitor set for 2 minutes or more may be affected.

In particular, the likelihood of impact is higher for metrics posted at intervals other than 1 minute. For example, there are some AWS integration metrics posted at 5 minute intervals.

Date of change

February 8th 2018 (Thursday)

A concrete comparison of the current and new specifications

The average value calculation target for the following metric monitoring will change as shown below.

Current status New specifications
Host metric monitoring N minutes N points
Response time monitoring N minutes N points
Service metric monitoring N points N points

Differences between "N minutes" and "N points"

The average value for metrics targeted in “N minutes” is calculated on the premise that data be posted every minute. In some cases, this behavior is not intuitive, such as the following.

  • When data is posted at an interval other than 1 minute
    • For example, at an interval of 5 minutes, etc.
  • When metric data is temporary missing
    • For example, in plugins that performs diff calculation for the counter value, metrics are not calculated when the counter decreases

In these cases, when the average value monitor is set for 2 minutes or more, the average value is not calculated for the 1 minute of the value not being measured (null).

In other words, you can’t realistically configure an average value monitor of 2 minutes or more for 5 min interval host metrics and monitoring doesn’t work when data is temporarily missing and the average values of before and after can’t be calculated.

The following example specifically illustrates how the average of N minutes and the average of N points are calculated differently when there is missing data.

Time Raw metrics 3 minute average 3 point average
15:00 10 - -
15:01 11 - -
15:02 12 11 11
15:03 13 12 12
15:04 null null 12
15:05 14 null 13
15:06 15 null 14
15:07 16 15 15

This specification adjustment will unify the average value calculation in N points.

Additional information

With this adjustment, average value monitoring now works with metrics of arbitrary intervals. However, as with current service metrics, only metric points within the last 24 hours are eligible. In other words, if you are posting metrics on a daily basis, monitoring the average value of 2 points or more will not work well.

In cases where the posting interval is not constant, data is not weighted according to the interval and the average value is simply calculated.

Additionally, the shortest metric interval remains unchanged at 1 minute. In other words, it is not currently possible to save multiple metric points between one minute intervals. Even if multiple metrics are posted with a precision of less than 1 minute, they are rounded up to 1 minute and overwritten with the most recently posted value.

As we continue striving to improve our services, we appreciate your understanding and cooperation.

On 2/27 (Tuesday) the system will temporarily shutdown for database maintenance

Mackerel Product Owner id:Songmu here. As stated in the title, system maintenance is scheduled to be carried out this month.

Because this maintenance will require the system to be temporarily shutdown for a relatively long period of time, we understand that this will be an inconvenience for all of our users and we apologize. Nevertheless, this maintenance is indispensable to continuously providing better service in the future. We appreciate your understanding and cooperation.

Scheduled date and time

Tuesday, February 27th, 2018 from 2:30 p.m. - 5:30 p.m. (JST)

Implementation content

Database maintenance

Regarding the impact on the day of

  • The maintenance completion time stated above, is an estimate of the longest case scenario. The actual maintenance period will end once the work has been completed.
  • After maintenance has begun, the entire Mackerel system will shutdown for a short period of time
    • Web access to Mackerel, data posting by the agent, API access (including the CLI tool), alert notifications, etc. will be unavailable
  • As soon as the maintenance work is completed and operation confirmation is obtained, the system suspension will end and an announcement will be made
  • As for mackerel-agent metric posting, data will be buffered from mackerel-agent during the maintenance period and resent after maintenance has been completed
    • If resent correctly, graphs during the maintenance period will also be displayed

Regarding announcements on the day of

Announcements will be made from the Mackerel status page (http://status.mackerel.io) as well as from this blog (https://mackerel.io/blog/).

Additionally, we’ll also be using our official Twitter account (https://twitter.com/mackerelio_jp).

For inquiries related to this matter

Please send all inquiries regarding this matter to support@mackerel.io.

Thank you again for your understanding and cooperation. And thank you for choosing Mackerel.

The metric retention period and the period to check metrics of 1 min granularity have been extended etc.

Mackerel users, thanks for waiting! The following large updates were made regarding metric retention with Mackerel.

  • The Standard plan’s metric retention period has been extended to【460 days】 (previously【400 days】).
  • The period to check metrics of 1 min granularity is now【460 days】 (previously【25 hours】)
    • Due to feature implementation timing, checking 1 min granularity metrics is only available for data after December 1st 2017.

I am very happy to be able to bring this update news to all of you. Now it’s even easier to check server load seasonality and apply capacity planning. Definitely give it a try!

Other updates follow below.

You can now set an email address to receive Organization related mail

As you can see in the image above, it’s now possible to add an email address to receive mail related to "Payment" and "Support Team contact".

Up until now, "payment related email" was sent to users with owner authority and “emails exchanged with the Mackerel support team” were sent to the user who contacted the support team. Now, in addition to having mail sent to each user, you can have mail sent to an address of your specification.

This can be configured in the Organization’s settings.

Updates for Mackerel related OSS

Updates for various Mackerel related OSS have been made. The details follow below.

mackerel-agent v0.51.0

  • [Windows] An issue where the memory pagefile value was mistakenly multiplied by 1024 has been fixed.

mackerel-agent-plugins v0.43.0

  • Plugin passwords can now be passed with environment variables. The target plugins and environment variable names are as follows.
    • mackerel-plugin-postgres
      • PGPASSWORD
    • mackerel-plugin-openldap
      • OPENLDAP_REPL_MASTER_PASSWORD
      • OPENLDAP_REPL_LOCAL_PASSWORD
      • OPENLDAP_PASSWORD
    • mackerel-plugin-redis
      • REDIS_PASSWORD
    • mackerel-plugin-haproxy
      • HAPROXY_PASSWORD
    • mackerel-plugin-sidekiq
      • SIDEKIQ_PASSWORD
    • mackerel-plugin-mysql
      • MYSQL_PASSWORD
    • mackerel-plugin-rabbitmq
      • RABBITMQ_PASSWORD
    • mackerel-plugin-mongodb
      • MONGODB_PASSWORD

mackerel-check-plugins v0.16.0

  • Plugin passwords can now be passed with environment variables. The target plugins and environment variable names are as follows.

mkr v0.26.0

  • [Windows] An issue where the .zip file was also copied to the bin directory when doing mkr plugin install has been fixed.

To everyone who contributed Pull Requests, thank you!

Mackerel Meetup #11

Mackerel Meetup #11, the first official event of 2018 will be held on February 5th!

mackerelio.connpass.com (Japanese only)

We are pleased to announce that two Mackerel user companies, Seesaa Co, Inc. and Makuake , Inc. (in order of presentation) will be giving guest presentations! There’s still plenty of room left for both general and blog participants. We’ll be talking about the future direction of the evolution of Mackerel, so you don’t want to miss out!

Command line tool・mkr now included in the package for Windows etc.

Hello! Mackerel team CRE Inoue (id:a-know) here.

I thought this week was going to start off with a big cold front, but it turned out warming up quite a bit, with weather comparable to that of the month of March. With these changes, it’s been difficult to regulate body temperature. I hope everyone is staying healthy out there.

Luckily, I personally haven’t had any big health problems this winter and I’d like to keep it that way. I’ll be sure to keep “monitoring” my various “metrics”... maybe I’ve already caught an occupational illness? (lol)

Now on to this week’s update information.

Command line tool・mkr now included in the package for Windows

With this week’s update, mkr, the command line tool that strongly supports the implementation of various operations regarding Mackerel via command line is now included in the package for Windows. Now it’s even easier to use mkr.

For more on the basic usages of mkr and various helpful cases using the tool, refer to the following help pages.

mackerel.io

mackerel.io

When using mkr in a server that is running mackerel-agent, the configuration file inside the server is automatically referenced, so it is not necessary to specify MACKEREL_APIKEY or your own host ID .

mkr status
mkr retire

Updates for mackerel-agent(v0.50.1)

The following updates have been made for mackerel-agent.

  • Command line tool・mkr is now included in the package for Windows (Described above)
  • Stability improved when using the v1 package
  • mackerel-agent once behavior changed
    • mackerel-agent once is a command that can check behavior by manually executing mackerel-agent once.
    • Until now, even if metric collection failed, the exit status would be 0 (normal handling).
    • With this update, failure to collect metrics will end with the exit status 1 (abnormal handling).