Mackerel blog #mackerelio

The Official Blog of Mackerel

ECS now supported with AWS Integration and more

Mackerel Team CRE Miura (id:missasan) here.

Earlier this year in February, the public beta version of Mackerel’s container agent was released. And with the growing interest in container monitoring, AWS Integration has now been updated to support ECS. With this feature, you can integrate and check the metrics of ECS clusters and services in Mackerel. Definitely give it a try with the container agent.

A large number of OSS updates were made this week as well. Thank you to all the contributors!

Now on to this week’s update information.

ECS now supported with AWS Integration

For details regarding obtainable metrics and billing information, refer to the following help page.

[go-check-plugins] check-ping added

With the release of go-check-plugins v0.29.0, check-ping was added. This can be executed by specifying something along the lines of check-ping -H -n 5 -w 100. Using this plugin, you can now monitor the connectivity of network devices that can’t be installed with Mackerel agent.

For more details on configuration methods, check out the README linked below.

[go-check-plugins] Option added to exclude service names from check-ntservice

With the release of go-check-plugins v0.29.0, we also saw the addition of the --exclude-service option which allows you to specify a service name to be excluded in check-ntservice.

[go-check-plugins] Option added to check-ntpoffset to check whether the NTP upper stratum is correctly synchronized

The release go-check-plugins v0.29.0 also added the --check-stratum option which allows you to check whether or not the NTP upper stratum is correctly synchronized in check-ntpoffset .

[go-check-plugins] Option added to specify the http proxy with check-http

With the release of go-check-plugins v0.29.0, the --proxy option was added to check-http to specify the http proxy. Before now, this could be specified in environment variables, but is now even easier to use as it can be expressed in the command line option.

[mkr] mkr monitors now supports threshold configuration for service metric interruption monitoring

With the release of mkr v0.36.0, the mkr monitors command now supports threshold configuration for interruption monitoring of service metrics.

Mackerel at the Gartner Conference 2019 from April 23rd until April 25th!

Mackerel will be running a booth at the Gartner IT Infrastructure, Operations & Cloud Strategies Conference 2019 held in the the main building of Happo-en from April 23rd (Tue) until April 25th (Thu). If you’re planning to attend the event, be sure to stop by the Mackerel booth.

Event Details

Experimental Feature: Expression Monitoring was stopped from 4/2 to 4/3

Thank you for choosing Mackerel.

Expression Monitoring, which is offered as an Experimental Feature, was stopped during the following time period.

  • April 2 (Tuesday) 11:54 a.m. - April 3 (Wed) 5:49 p.m. (JST)

This stoppage was caused by a misconfiguration of the server that performs batch processing in relation to the Expression Monitoring feature. We apologize for the inconvenience and thank you for your understanding.

Monitoring for service metric interruption is now possible and more

Mackerel Team CRE Miura (id:missasan) here.

Due to popular demand, we’ve released a function for service metric interruption monitoring.

Service metrics are versatile because they can be posted using the API without having to go through the Mackerel agent. With this release, it is now possible to detect when service metric posting gets interrupted and prevent the situation from going unnoticed.

Now on to this week’s update information.

Monitoring for service metric interruption is now possible

With Mackerel, you can post and visualize metrics from external services and applications that aren’t directly linked to a specific server using the service metric API. You can do this by hitting the API directly or with fluentd.

Interrupted monitoring for service metrics is a mechanism to detect whether the process of posting these metrics gets interrupted for some reason.

Sometimes stoppages occur with fluentd, and we’ve received feedback from users saying that it took them a few days to even notice that service metrics weren’t being sent. This kind of situation can now be prevented by configuring interruption monitoring.

Configuration Method

Configuring interruption monitoring can be done from the service metric monitoring tab in Monitor Settings or from the editing screen of existing service metric monitors.


Graph legend display for Graphboard expression graphs

As shown in the image below, a graph legend can now be displayed on the bottom of expression graphs that are on the Graphboard which can be created from the service details screen.


Release of mkr v0.35.1

Hello! Mackerel Team CRE id:a-know here.

The weather has warmed up considerably and I’m beginning to see some cherry blossoms popping up here and there on my commute to work. Anyways, on to this week's release contents.

Release of mkr v0.35.1

With the release of version 0.35.1 for Mackerel’s command line tool mkr, the following bug contained in version 0.35.0 has been fixed.

  • The specification content for the hosts subcommand --status option not being applied.

Just to be safe, we recommend checking what is installed on your PC and the version of mkr being used with regularly run scripts.

Release of Anomaly Detection for roles and more

Mackerel team CRE Miura (id:missasan) here.

Thank you to everyone who came out to the Meetup last weekend. I hope everyone had a good time. The event report will be coming out soon, so be sure to keep a lookout for that.

Last week we finally released Anomaly Detection for roles, our new feature that uses machine learning. The feature is scheduled for an official release in May, but until then, you can try out the Anomaly Detection for roles feature as part of our free promotion. Be sure to give it a try and let us know what you think.

Now on to the this week’s update information.

Release of Anomaly Detection for roles (with free promotional campaign)

Our new feature Anomaly Detection for roles (beta version), which uses machine learning, has been released. For more on how to use the feature, be sure to check out the page linked below.

For more details, refer to the help page linked below as well.

We are currently offering a free promotional campaign!

During the feature’s beta period, Anomaly Detection for roles can be used for free (with no additional charges). This feature is for ‘Standard’ and ‘Trial’ plans. Be sure to take advantage of this free period and try out the new feature in a variety of different environments. We’re looking forward to your feedback!

Please note, following the feature’s official release scheduled for May, environments that have Anomaly Detection for roles enabled will automatically switchover to incur charges.

Help and other Mackerel documents made open-source

Mackerel's Help pages and other documents are now open-source.

If there are any parts that need correction regarding the Help or FAQ, we are accepting pull requests. Japanese only and pull requests in Japanese are also welcome.

We look forward to your pull requests!

check-log plugin now supported for log read timeout

With the release of go-check-plugins v0.28.0, the check-log plugin is now supported for log read timeouts. Up until now, if a timeout occurred while a log was being read, it would sometimes result in an error. With this release, improvements were made and timeouts during log reading can now be handled normally.

If the command configuration in the Mackerel agent configuration file is not set to array specifications (for character string specifications), there is a possibility that timeouts can not be handled normally depending on the environment. Therefore, it is recommended that you set the command configuration to array specifications.

Operation Monitoring Solution Seminar with Cloud portal x SIOS Coati x Mackerel on March 12th (Tues)!

Together, Hatena, SIOS TECHNOLOGY, Inc., and Sony Network Communications, Inc. will be holding a seminar in Tokyo.

We’ve heard from quite a few companies that have had operational issues with the introduction of AWS. This seminar will introduce application management tools to help you automate as much as possible and ensure that AWS is on track! We’ll go over the best practices for managing AWS with Hatena's Mackerel, SIOS Technology’s SIOS Coati, and Sony Network Communications' Managed Cloud Portal.

Event Details

  • Date and time:Tuesday, March 12, 2019 from 3:00 p.m. - 5:30 p.m.(Reception starts at 2:30 p.m.)
  • Venue:Akihabara UDX 4F Next-2 (2 min. walk from Akihabara station) [MAP]
  • Admission:Free
  • Sponsors:SIOS TECHNOLOGY, Inc., and Sony Network Communications, Inc., and Hatena, Inc.

Apply here (Japanese only)

DevOps Hands-on ~Building a safe and secure DevOps environment with AWS and Mackerel~ on March 15th (Fri) !

Hatena and Classmethod, Inc. will hold a hands-on seminar at the Shibuya Hikarie on Friday March 15th.

At this event, Hatena (Mackerel) and Classmethod, both of which who have earned DevOps competency certifications in the AWS partner system, will explain hands-on how to build CI/CD pipeline environments that combine Mackerel and AWS Code series. This is a great opportunity to learn more about the latest DevOps environments that combine monitoring and CI/CD pipelines.

Event Details

  • Date and time:Friday, March 15, 2019 from 2:00 p.m. - 4:30 p.m. (Reception starts at 1:30 p.m.)
  • Venue:Shibuya Hikarie 11th floor Sky Lobby Hikarie Conference Room C [MAP]
  • Capacity:20 people
  • Addmission:Free
  • Sponsors:Classmethod, Inc. and Hatena, Inc.

Apply here (Japanese only)

New feature・How to use Anomaly Detection for roles

Hello. Mackerel Team Director id:daiksy here.

The beta version of Mackerel’s new feature ‘Anomaly Detection for roles’ which uses machine learning is now being offered. You might have heard about the development of this feature at Meetup and other past events.

Anomaly detection differs slightly from the way monitoring has been used up until now. In this article, we’ll take a look at what anomaly detection is and how it can be used.

What is Anomaly Detection for roles?

‘Anomaly Detection for roles’ is a function that uses machine learning to detect abnormalities in the server without having to set special monitoring items for hosts within a role in Mackerel.

Up until now, a substantial amount of experience and know-how regarding server monitoring was needed to be able to configure monitors precisely. Let’s say you want an alert issued when the CPU load gets high, but it’s actually quite difficult to determine what percentage of CPU usage is considered high-load, or what thresholds should be set for which items when detecting for application abnormalities. In order to be able to make these kinds of decisions, operational experience and technical knowledge are needed. On top of this, the idiosyncrasies of applications change daily, and if left alone, monitor configurations can become obsolete, so regular maintenance is a must.

‘Anomaly Detection for roles’ can help with these types of monitoring complications.

With Mackerel, it’s recommended that you organize your servers into roles. The role being the role that server plays in a service. By appropriately setting roles, you can classify groups of servers with similar load trends such as "application servers" or "database servers". Mackerel's ‘Anomaly Detection for roles’ feature uses machine learning to learn a server's "normal state" from past trends of metrics over the entire role. Newly posted metrics are monitored against the learned results, and anything that is outside of the "normal state" is regarded as an anomaly, and an alert occurs. In other words, the ‘Anomaly Detection for roles’ feature detects server abnormalities without having to configure individual monitors.

Role configuration is vital to improving detection precision

With "Anomaly Detection for roles", roles are specified as the monitoring target. Mackerel uses past system metrics from hosts that are registered in the specified role to learn trends. As previously mentioned, it is recommended that roles be categorized by the role a server plays in a service, such as application servers and database servers, because of this we can assume that a role contains servers with similar metric trends, and trends can be learned from the entire role. Consequently, if a role contains servers with significantly different trends, or those with extremely different specifications, accuracy will fall. For example, when "active" and "standby" servers coexist for a long period of time, servers with different trends get mixed together in the role.

Therefore, In order to increase the precision of Mackerel's Anomaly Detection, it is important to first properly categorize servers by roles.

How to use Anomaly Detection for roles

‘Anomaly Detection for roles’ learns trends from past metrics. If newly posted metrics are determined to be outside of those trends, an alert will occur. This alert notification will also display the metrics that were determined to be abnormal. From this information, the user who receives the alert can estimate what kind of anomaly is occurring in the server. For example, the alert may show an increase in memory usage outside of the normal trend.

However, even if the alert shows that the detected anomaly is based on memory usage metrics, there is no guarantee that the cause of the issue applies to memory. This point requires careful attention.

‘Anomaly Detection for roles’ performs a combination of learning and judgment for trends of system metrics of hosts configured in a role. When an issue occurs in a server, more often than not several metrics are affected by the primary cause of the issue. For example, when the amount of data written to a disk increases, the network's transfer load needed to send that data also increases at the same time, and as a result, this may also affect memory usage. Even if trends change in such a complex manner, only the one metric used as the basis of detection is recorded in alerts of ‘Anomaly Detection for roles’. So, you need to be able to see the role graph transversely when an alert occurs.

Due to the nature of this feature, it’s slightly difficult to define a typical troubleshooting response such as "restart this server when this monitoring alert occurs". We recommend using ‘Anomaly Detection for roles’ to quickly detect the rare case anomalies in auxiliary monitoring applications while also configuring monitors with thresholds based on your past operational experience. You might also consider an operation cycle where you add a threshold based monitor upon receiving an alert with anomaly detection.

Current limitations with Anomaly Detection for roles

At the moment, ‘Anomaly Detection for roles’ is only supported for Linux environments with mackerel-agent installed. Windows and Integration environments are not currently supported.

cron and other batch jobs can now be monitored with the mkr wrap command and more

Mackerel team CRE Miura(id:missasan) here.

As the end of the week approaches, we look forward to Mackerel Meetup # 13 Tokyo on March 1st (Friday). The number of LT spots has been increased. There’s still 1 open, so please apply! General participation is also still available. Let's have a drink together at Meetup this Friday! (Japanese only)

Now on to this week’s update information.

cron and other batch jobs can now be monitored with the mkr wrap command

With mkr v0.35.0, cron and other batch jobs can now be monitored with the mkr wrap command. When you execute a command such as % mkr wrap -- /path/to/your-batch …, and the command returns with a non-zero exit, an alert will occur in Mackerel.

For more details, check out the help page linked below.

check plugin operation can now be checked with the mkr check run command

With mkr v0.35.0, you can now use the mkr check run command to check the configuration and operation of the check plugin specified in mackerel-agent.conf.

When you execute a command such as % mkr checks run, results similar to the following will be displayed.

ok 7 - load
  command: ['check-load -w 2,2,2 -c 5,5,5']
  status: OK
  stdout: 'LOAD OK: load average: 0.06, 0.03, 0.05'

If the check fails, it will have a non-zero exit.

source option added to mackerel-plugin-mongodb

With mackerel-agent-plugins v0.55.0, the -source option was added to mackerel-plugin-mongodb. By specifying -source=<authenticationDatabase> when executing the plugin, it is now possible to select the database specified during user authentication.

socket option added to mackerel-plugin-php-fpm

With mackerel-agent-plugins v0.55.0, the -socket option was added to mackerel-plugin-php-fpm. This option allows you to retrieve metrics via UNIX domain sockets and TCP services.

Check out the README below for more details on how to use the option.

Metrics added to AWS Integration Redshift

New metrics such as QueriesCompletedPerSecond and more have been added to AWS Integration Redshift.

For more details on obtainable metrics, check out the help page linked below.

A big thank you to everyone who contributed!