Adding monitors for script checks

Check monitoring is a feature that monitors the check plugin execution results similarly to Nagios. The agent periodically runs the check plugin and sends the results to Mackerel.

An official plugin pack is available. For more information please refer to Using the official check plugin pack for check monitoring.

Additionally, to develop a plugin using github.com/mackerelio/checkers (a helper library that is used in our official plugins), please refer to Creating check plugins using checkers.

By registering a command which outputs monitor results in the below-mentioned Nagios plugin compatible format, that output will be transmitted to Mackerel and visualized in the hosts details screen or the alerts screen.

Check items will be counted as 1 host metric each. Limits for each plan can be viewed here.

Configuration

In the agent settings file, add an item as shown here:

[plugin.checks.ssh]
command = "ruby /path/to/check-ssh.rb"
notification_interval = 60
max_check_attempts = 1
check_interval = 5
prevent_alert_auto_close = true
action = { command = "ruby /path/to/notify_something.rb" }
  • Item name: With the key for the settings file, the item name must begin with "plugin.checks." and contain exactly two periods. Anything after the second dot will be used as the monitor settings name.
  • command: This command will have the agent temporarily execute, and use it’s exit status/standard output as the monitoring result.
  • notification_interval: The notification re-sending interval will be designated in minutes. If the notification is abbreviated, it will not be re-sent. An interval of less than 10 minutes can not be designated. If an interval of less than 10 min is designated, the notification will be re-sent at 10 minutes.
  • max_check_attempts: An alert will be sent for any result other than “okay” in the designated number sequence. For example, if set at 3 and the latest monitoring result for all three is not ok, then a notification will be sent. When used with prevent_alert_auto_close, the value of max_check_attempts will be treated as 1 regardless of the specified value.
  • check_interval: Designate the check monitoring execution interval in minutes. The default value is 1 minute. The configurable range is 1 to 60 minutes. If a value of less than 1 minute is designated, monitoring will be run at 1 minute intervals. If a value of more than 60 minutes is designated, monitoring will be run at 60 minute intervals.
  • prevent_alert_auto_close: With this value set to true, alerts opened for this check plugin will not be automatically closed. When used with max_check_attempts, max_check_attempts will always be treated as 1.
  • action.command: An action executed following the execution of the command configured in command. This is used when there is a process to be performed depending on the command result. The result of the previous/current command etc. is passed as an environment variable. The execution result is ignored.

Check plugin specs

The specs for the Nagios plugin and the Sensu check script are mostly the same. In the settings file, the assign command’s exit status will be treated as shown below.

exit status meaning
0 OK
1 WARNING
2 CRITICAL
other than 0,1, or 2 UNKNOWN

It’s possible to add an auxiliary message to the standard output. The maximum character limit for messages is 1024.

Check monitoring notifications

An alert notification will be sent when an alert has occurred and when the condition has been changed after an alert has occurred. Two cases for “when the alert condition has been changed” follow below.

  • When the status has changed
    • ex. CRITICAL -> WARNING, WARNING -> CRITICAL, CRITICAL -> OK
    • Including when the status is “OK”

When the condition of an alert or the message content has changed, that information will also be available in the alert details screen. A notification will not occur if just the message content changes.

Environment variables available with action

Environment variable Description
MACKEREL_STATUS The result of the previous command (max_check_attempts not taken into account).Either OK, WARNING, CRITICAL, or UNKNOWN.
MACKEREL_PREVIOUS_STATUS The result of the command before the previous command (max_check_attempts not taken into account). The initial result is an empty string after starting-up the agent. Either an empty string, OK, WARNING, CRITICAL, or UNKNOWN.

An example in Ruby

This is a plugin that takes the values of a six-sided die as messages, 4 and 5 being a WARNING and 6 being a CRITICAL, and posts them to Mackerel.

#!/usr/bin/env ruby
dice = rand(6)+1
puts "value is #{dice}"
exit (dice >= 6 ? 2 : dice >= 4 ? 1 : 0)

By executing the agent configured with a check plugin, an item showing that monitoring is active will be displayed in the host details page as shown below.

If an alert is raised it will be displayed as shown below and can be confirmed in the alert details page.