Mackerel blog #mackerelio

The Official Blog of Mackerel

mackerel-agent has been updated with countermeasures for .io domain malfunctions

Yesterday we celebrated Mackerel Day, our third year anniversary event marking the official release of Mackerel.

mackerelio.connpass.com (Japanese only)

Thanks to all of you, the event was a huge success that showcased great presentations from a handful of wonderful speakers. We would like to sincerely thank all the presenters and everyone who attended the event. We will soon be posting a separate report on this blog with details from the event.

Anyways, here is this week’s update information.

mackerel-agent has been updated with countermeasures for .io domain malfunctions

In the blog entry linked below, we made an announcement regarding the .io domain malfunction and the corresponding measures taken by Mackerel. However, this week we have updated mackerel-agent with permanent countermeasures.

mackerel.io

Specifically, the API request destination from the agent to Mackerel has been changed from Mackerel.io to api.mackerelio.com. In response to this, the temporary countermeasures taken, such as changing the connectivity monitoring judgment interval, will be returned to its’ original state in sequential order.

To all of the mackerel-agent users out there, please update the agent or add the description apibase = "https://api.mackerelio.com/" to the top line of the configuration file and restart the agent (excluding KCPS version Mackerel).

Updates for Mackerel related OSS

Below is a list of the updated content including updates for mackerel-agent not mentioned above.

mackerel-agent v0.46.0

  • include_pattern exclude_pattern can now be specified.
    • This allows you to filter the metric type sent from the metric plugin.
    • For details, check out this PullRequest.

mackerel-agent-plugins v0.35.0

  • [twemproxy] Changed so that metrics from each server are not obtained by default
    • If the previous behavior is preferred, use the option-enable-each-server-metrics.

mkr v0.21.0

  • The API request destination to Mackerel has been changed from mackerel.io to api.mackerelio.com.

mackerel-client-ruby v0.3.0

  • The API request destination to Mackerel has been changed from mackerel.io to api.mackerelio.com.
  • customIdentifier is now available in host searches
  • Metadata API supported

It’s now possible to configure in OpsGenie and PagerDuty to not have alerts closed when closed in Mackerel.

As a settings option for the notification channels OpsGenie and PagerDuty, it is now possible to specify whether or not alerts of OpsGenie/PagerDuty will close according to alerts closing in Mackerel. Try using it in accordance with your team’s incident management style.

Notification channels with improper destinations can now be suspended

Notification channels will now be automatically suspended if they fail a certain number of times for reasons such as having an invalid URL configured or if the the notification destination URL was accidentally deleted etc. While suspended, the channel will be displayed as shown below in the notification channel list.

This status can be lifted by clicking “Unsuspend the channel”.

Other updates

  • Metrics were added to Lambda Integration
    • Iterator Age [ms]
    • Iterator Age [ms] per alias
    • Iterator Age [ms] per version
  • The problem where notifications to PagerDuty would fail when the notification message was too long has been fixed.
  • Sharing graphs with the camera button is now maintained even if the display is switched to a stacked graph.

Advance notice regarding the adjustment of response time measurement specifications for URL external monitoring

Mackerel sub-producer id:Songmu here.

On Thursday, October 12th, we will be adjusting specifications in relation to the URL external monitoring response time measurement. These adjustments are scheduled to be released between the hours of 2:00pm - 4:00pm (Japan Standard Time). The following two points are the issues to be adjusted.

  • DNS name resolution time will not longer be included in URL external monitoring response time measurement
  • Response time will no longer be recorded if the HTTP response does not return

DNS name resolution time no longer included in URL external monitoring response time measurement

Although the DNS name resolution time is currently included in the response time measured by URL external monitoring, a phenomenon has been occurring in which that segment of the response time increases when the DNS cache TTL expires.

In order to resolve this specification issue, name resolution time will no longer be included in the response time.

Response time no longer recorded if HTTP response is not returned

Currently, the response time is measured and posted in Mackerel, regardless of whether or not the HTTP response has returned from the request target server.

With this current situation, cases like the following occur.

  • Values close to 0 milliseconds are measured when name resolution fails or the TCP connection is denied
  • 15000 milliseconds are measured when the connection times out (15000 milliseconds is the configured value for timeout in the URL external monitoring client specifications)

We believe that these are not intuitive behaviors. Therefore we will make changes so that the response time is not measured and only recorded when the server returns an HTTP response.

Behavior for alerts when an HTTP response is not returned will not change.

How this will effect the user

We do not believe there will be any major impact on the user, but please adjust thresholds as necessary when monitoring response times.

Future adjustments for URL external monitoring

URL external monitoring is a mechanism to send requests from the servers used by the Mackerel system directly to the user’s URL and it is easily susceptible to various influences such as system load, intermediate network routing, etc. Because of this, measurement fluctuations can occur.

In order to stabilize quality, we will make fine adjustments from time to time as needed.

Thank you for your cooperation.

A revised version of the Mackerel Terms of Service is available as of today.

A revised version of the Mackerel Terms of Service is available as of today. You may review these changes below.

Terms of Service - Japanese Version

  • Added a lead into the company Privacy Policy onto Article 4 (Protection of Information).
  • Added a section concerning use by sales partners to Article 7 (Charges).
  • Changed "Article 15 (Governing Law and Jurisdiction)" to "Article 15 (Governing Law and Jurisdiction / Language)", and added a clause stating that the official version of these Terms is the Japanese version.
  • Performed small revisions to correct writing errors and change phrasing.

Terms of Service (For KCPS Users)

  • Added a lead into the company Privacy Policy onto Article 4 (Protection of Information).
  • In adding the clause stated above in "Terms of Service - Japanese Version" concerning sales partners, we also changed the wording of Article 7 (Charges) to match the "Terms of Service - Japanese Version."
  • Performed small revisions to correct writing errors and change phrasing.

Mackerel Terms of Service - English Version

  • Reviewed the translation as a whole and made corrections where the English version differed from the Japanese version.

Updates for mackerel-agent and more & Our 3 year anniversary event coming soon! etc.

Hello! Mackerel team CRE Inoue (id:a-know) here.

As announced last week, this week, "Mackerel Week", was one packed with Mackerel events.

Starting on Monday with the Mackerel / NewRelic / Elasticsearch Seminar.

First up for presenters, Mackerel team CRE, Sone (id:Soudai)

On Tuesday, Mackerel Drink Up #6 was held in Osaka where Hatena staff member, id:papix, kicked off the Lightning Talk.

Wednesday, we ran a booth at the AWS CloudRoadShow 2017 in Nagoya and were able to introduce Mackerel to a lot of engineers based in that area.

Energetic as always, Mackerel team CRE Sone (id:Soudai) in Nagoya

Thank you again to Recruit Technologies Co.,Ltd. and MOTEX Inc. for letting us use such a nice space. Also, I would like to take this opportunity to thank everyone who came to the AWS CloudRoadShow 2017 Nagoya!

Anyways, here is this week’s update information.

Updates for mackerel-agent and more

With this week’s release, Updates have been made for mackerel-agent and more. Detailed content is listed below.

mackerel-agent v0.45.0

  • Now able to build with Go 1.9

mackerel-agent-plugins v0.34.0

  • mackerel-plugin-flume, sidekiq added to package
  • [aws-dynamodb] Metrics fixed
  • [mysql] extended metrics when enable_extended is valid were fixed.
  • [openldap] Metrics fixed

mackerel-check-plugins v0.13.0

  • [check-log] The following changes to the default behavior were made.
    • 【Before change】All lines in the target log file are read and checked upon initial startup.
    • 【After change】Target logs are not checked upon initial startup. Specific segments of logs can be checked by specifying and outputting after startup.
    • If you prefer the previous behavior, specify the --check-first option and the behavior will remain the same as before.
  • Now able to build with Go 1.9

mkr v0.20.0

  • Now able to build with Go 1.9

A big thanks to all of our users who contributed!

“Mackerel Day”, Mackerel’s 3 year anniversary event coming soon!

As of September 17th, it has been three years since the official release of Mackerel. We’ve made it this far, continuing to implement advanced features along the way, all thanks to everyone out there using Mackerel.

To celebrate Mackerel’s 3rd year anniversary and as a thank you to all of our users, we will be holding “Mackerel Day”, a special edition Meetup event!

mackerelio.connpass.com

We have invited our friends and valued users DMM.com Labo, Freee, GMO Pepabo, and Mercari (in order of presentations) to take to the stage and talk about examples of useful application and the various ways of using Mackerel. Amazon Japan is providing an amazing venue for the event and Tomoaki Sakatoku of Solution Architect will also be speaking. Even Sugiyama, Ono, and Inoue (me) of Hatena will take the stage!

Everyone from the Mackerel development team will be there for the event and social get-together. This will be an event where users and developers gather under one roof! You definitely don’t want to miss out on this!

Regarding false connectivity monitoring alerts from Mackerel related to .io domain malfunctions and our response

Mackerel sub-producer id:Songmu here. I’d like to sincerely apologize to all of our users for the frequent inconveniences related to the subject matter.

In this blog, I’ll go over the details of this matter and our response hereafter.

About false connectivity monitoring alerts

In Mackerel, when metric posting from mackerel-agent gets interrupted for a certain period of time, the server goes down and connectivity monitoring alerts are sent out.

Currently, Domain Name Resolution for mackerel.io is unstable. As a result, name resolution is failing and the mackerel-agent is unable to post metrics due to temporary inaccess of Mackerel, thus causing the Mackerel system to recognize the server as having gone down and sending out false connectivity monitoring alerts.

Regarding name resolution instability for mackerel.io

Currently, one section of the DNS server, the io domain authority, is temporarily returning domains which should exist as NXDOMAIN (domain does not exist) by mistake. Name resolution is failing for this reason and we are reviewing the matter.

Additionally, because the negative cache TTL is set to 900 seconds, name resolution fails for up to 15 minutes.

Although this matter is only temporary, we have confirmed that it has occurred more than once and may occur again in the future. Therefore, we are implementing the following tentative solutions.

Temporary solutions

In order to keep false alerts to a minimum, the following solutions are currently being implemented.

  • extending the mackerel.io domain TTL
  • temporarily suspending connectivity monitoring component when mackerel.io name resolution is unstable
  • configuring the connectivity monitoring judgment interval for longer than the negative cache TTL

We have managed to reduce the amount of false alerts through these measures, but the reliability of connectivity monitoring is still low. Especially when the server actually goes down, reporting takes 15 minutes or more. So, we recognize that separate fundamental measures are also required.

Fundamental measures

Over the next month or so, we will acquire a new TLD domain that is more reliable and switch mackerel-agent to use that domain. For backward compatibility, the current mackerel.io will continue running concurrently. In order to use the new domain, you’ll need to upgrade mackerel-agent.

Additionally, although still under review, we are considering setting up a mechanism to use an alternate route (another domain etc.) if mackerel-agent fails to post in Mackerel.

Again, I sincerely apologize for any inconvenience this matter may have caused. In the future, we will do our best maintain a stable operation. Thank you for choosing Mackerel.

We released the Amazon SQS integration and more

Hello, Mackerel Team CRE (Customer Reliability Engineer) Inoue (id:a-know) here.

Here is this week’s update information.

“Mackerel” received Japan’s first AWS DevOps Competency certification

Mackerel has acquired “AWS DevOps Competency” certification under the AWS partner system “AWS Partner Competency Program”.

Mackerel is the first service from a company in Japan to acquire “AWS DevOps Competency” certification.

aws.amazon.com

Mackerel’s vision includes not only “server monitoring” but also promoting the efficiency of DevOps in “infrastructure management” and the day-to-day, dynamically changing infrastructure environment, so we are extremely pleased with this certification.

We released the Amazon SQS integration

Amazon SQS is newly supported for host integration targets in AWS integration.

aws.amazon.com

The AmazonSQSReadOnlyAccess policy needs to be added to the IAM Role. See also the help below.

In Mackerel’s AWS integration, tags can be used to narrow down which cloud products are integration targets. But since Amazon SQS cannot give tags to queues (AWS specifications), please note that this narrowing down cannot be done.

The alert notification now includes an organization name

The notice to the notification channel, such as Slack, now includes the target organization name.

f:id:mackerelio:20170925124623p:plain

We think that it will be more convenient for those who are managing across multiple organizations in particular.

Quotation marks are no longer required in the expression specified in “customize graph”, “expression-based monitoring”

For the “expression” that can be specified in “customize graph”, which can define a graph to be displayed using a function expression, or “expression-based monitoring”, which can create monitoring rules for metrics derived using a function expression, quotation marks are no longer required for things such as the metric name. Specific examples are shown below.

  • Before update
    • “timeShift (avg (role(‘example: db’, ‘loadavg5’)), ‘1w’)”
  • After update
    • “timeShift (avg (rol (example: db, loadavg5)), 1w)”

“Customize graph” and “expression-based monitoring” are still experimental functions, but we will make improvements like this to make them more reliable and convenient to use.

Updated mackerel-agent-plugins, mkr

The update details are as follows. Thank you very much to everyone who contributed!

mackerel-agent-plugins v0.33.0

  • mackerel-plugin-nvidia-smi added to the package
  • [accesslog] now able to specify the log format (ltsv or apache)
  • mackerel-plugin-flume added
  • [mysql] handlers , transaction handler graph added

mkr v0.19.0

  • [mkr fetch] It is now possible to support even when many host IDs are specified
  • It now refers to the apibase specification in mackerel-agent.conf
    • If -apibase is specified as an argument, it is given priority
    • If nothing is specified, the request destination will be mackerel.io

Next week is Mackerel Week ! It will be jammed with events

Mackerel/NewRelic/Elasticsearch Seminar

First of all, Monday, September 25. We’ll hold a Mackerel/NewRelic/Elasticsearch Seminar at a venue provided by Recruit Technology!

mackerelio.connpass.com

At the moment, you can still apply under “blog quota” or “general quota (cannot be canceled on the day before or same day)”. It’s a very valuable opportunity to learn about “monitoring” as seen from three products of different stripes. Don’t miss it!

Mackerel Drink Up #6 Osaka

And on Tuesday, September 26th is Mackerel Drink Up #6 in Osaka!

mackerelio.connpass.com

This Drink Up will be held at a venue provided by MOTEX in Kansai! Many Mackerel staff will be taking part, so we’d like to communicate with you directly! People living in the Kansai area, please join us!

AWS CloudRoadShow 2017 Nagoya

And on Wednesday, September 27, Mackerel will also exhibit at AWS CloudRoadShow 2017 in Nagoya.

AWS Cloud Roadshow 2017 名古屋 powered by Intel® 広島、大阪、名古屋、福岡の 4 都市を巡る無料クラウドカンファレンス開催! | アマゾン ウェブ サービス

Both CRE members of the Mackerel team will be attending this venue. “I’m interested in Mackerel, but I’m not sure what it can do” – if that’s you and you’re in the Nagoya area, we’ll be waiting for your visit!

Updates for mackerel-agent-plugins and more

Well we’re halfway through September and it’s finally starting to feel like the fall season. It’s supposed to get chilly in the coming days. So be careful not to catch a cold!

Anyways, here is this week’s update information.

Host metrics posted at 5-minute intervals can now be checked more than 25 hours retroactively

Up until now in Mackerel, when posting host metrics, which were supposed to be posted every minute, at an interval of 5 minutes or more, data rounded over 25 hours would be lost.

With this week’s update, it is now possible to check back on data more than 25 hours retroactively. As a result, past changes among metrics obtained with AWS Integration can be checked from Mackerel, even data with 5-minute granularity.

This is a part of the development of the “New Time Series Data System”. We are continually making improvements, so look forward to future updates.

The role name of the official role registered in Ansible Galaxy has been changed

The official role registered in Ansible Galaxy has been updated. The new version is 0.7.0. Along with this, we are changing the role name from mackerel to mackerelio. Please be aware that it will no longer be available under the old name.

Also with this update, check plugin options can be specified. Give it a try.

Updates for mackerel-agent-plugins and more

Updates have been made for mackerel-agent-plugins and more. Continue below for the details.

mackerel-agent-plugins v0.32.0

  • [memcached]evicted.reclaimed and evicted.nonzero_evictions metrics were added.
  • [accesslog] Adjusted to be able to scan long logs
  • [mysql] Minor corrections were made.
  • [Redis] Minor corrections were made.

mkr v0.18.0

  • [mkr create] Be able to specify custom_identifier by --customIdentifier option.

cookbook-mackerel-agent

A big thanks to all of you who contributed!

Mackerel at the AWS Cloud Roadshow 2017 Osaka!

We made this announcement last week, but Mackerel will be running a booth at the AWS Cloud Roadshow 2017 Osaka starting next week, on September 21st!

AWS Cloud Roadshow 2017 大阪 powered by Intel® 広島、大阪、名古屋、福岡の 4 都市を巡る無料クラウドカンファレンス開催! | アマゾン ウェブ サービス

If you’re planning on attending the event, definitely stop-by our booth!