Adding monitors for script checks

Check monitoring is a feature to monitor the execution results of a check plugin. The agent will execute the check plugin periodically and send the results to Mackerel. Each check monitor item counts as one host metric. Please refer to Differences between metric monitoring and check monitoring for more information on the differences between check monitoring and the monitoring of host metrics and service metrics.

About check plugins

A program that performs the desired monitoring process is required for check monitoring. We have released an official check plugin pack that can be used for this purpose. Please refer to Using the official check plugin pack for check monitoring for more information.

Users may also use programs they have created as check plugins. Such programs must perform the desired monitoring process and return an exit status based on the monitoring results. Please refer to Check plugin specs for more information.

Sample agent configuration

Check monitoring settings are added to the Agent configuration file. The following is a sample monitor configuration using a selfmade program named check-ssh.rb. Please refer to Configuration items for a description of each item. Please set the non-required fields as needed.

[plugin.checks.ssh]
command = ["ruby", "/path/to/check-ssh.rb"]
check_interval = 5
timeout_seconds = 45
max_check_attempts = 3
prevent_alert_auto_close = true
notification_interval = 60
custom_identifier = "SOME_IDENTIFIER"
env = { HOST = "hostname", PORT = "port" }
memo = "This check monitor is ..."

Configuration items

As shown in the sample configuration for the agent, it should be followed by the key [plugin.checks.XXX] of the monitoring rule. Each configuration item must be configured for each monitoring rule.

Configuration item Required Description Default value
[plugin.checks.XXX] Define the key values of the monitoring rules in the configuration file in three levels separated by dots . The second level, plugin.checks., is fixed, and the third level, XXX, is used as the name of the monitoring rule. Dots are not allowed in XXX. The third level, XXX, will be used as the monitoring rule name.
command This is executed by the agent periodically, and its exit status and standard output are used as the monitoring results. Users may also execute arbitrary commands and programs they have created, in addition to the official check plugin.
check_interval Specifies the interval in minutes between check monitoring runs. If using an agent whose version is v0.67.0 or later, you may enter expressions such as 10m or 1h. Intervals allowed range from 1 minute to 60 minutes. Specified intervals under 1 minute will default to 1 minute, and intervals over 60 minutes will default to 60 minutes. 1 minute
timeout_seconds The timeout period in seconds for the processing of the program specified by command. Set the value so that it does not exceed check_interval. 30 seconds
max_check_attempts This specifies the maximum number of attempts. An alert will be generated in the event that a result besides OK is obtained for more times than the value specified here in succession.
This defaults to 1 regardless of the value specified when used in conjunction with prevent_alert_auto_close.
1
prevent_alert_auto_close Normally, if the result of monitoring after an alert occurs is OK, the alert is automatically closed, but if this is true, it remains opend.
max_check_attempts will always default to 1 when this is used in conjunction with max_check_attempts.
false
notification_interval Specifies the alert notification retransmission interval in minutes. If the agent version is v0.67.0 or later, it can be expressed as 10m or 1h. If less than 10 minutes is specified, it is treated as 10 minutes. If not set, the notification will not be resent. null
custom_identifier This monitoring rule is treated as monitoring the host specified in custom_identifier, not the host running mackerel-agent. If the result of the monitoring is not OK, an alert will be issued for the host specified here. The custom_identifier can be found in the API Get Host Information.
This is useful when performing check monitoring on hosts where agents cannot be installed, such as hosts linked with AWS / Azure / Google Cloud integration. See AWS Integration Documentation for more information.
null
env You can set environment variables. This is only valid for the command of the monitoring rule you set. null
action The action entered in this item will be executed after each execution of the command set in command. Please refer to How to write an action for more information. null
memo This allows a note to be set for the check monitor. You will see the string specified here in alert notifications as well as on the Alert Details screen and the Host Details screen. Up to 250 characters. null

How to write an action

You can write an action by entering the following items in the syntax "action = {}". Multiple items can be entered at the same time using a comma-delimited list.

Setting Description Default value
command This will be executed immediately after each execution of the command set in command. This is used to perform any further processing that is required depending on the execution results of the command. Environment variables described below may also be used. To distinguish it from the command in the monitoring rules, it is treated as action.command in this page.
env This can be used to specify environment variables that are passed to action.command. The variables are specified as a [Table] or [Inline Table] in TOML format. null
user This specifies the user by whom action.command is executed. This is not supported on Windows. root
timeout_seconds The timeout period in seconds for processing action.command. 30 seconds

Example of action

action = { command = "bash -c '[ \"$MACKEREL_STATUS\" != \"OK\" ]' && ruby /path/to/notify_something.rb", env = { NOTIFY_API_KEY = "API_KEY" }, user = "someone", timeout_seconds = 45 }

Environment variables available with action

Environment variable Description
MACKEREL_STATUS This is the execution result for the latest command. Possible values are OK, WARNING, CRITICAL, and UNKNOWN. max_check_attempts is not taken into account.
MACKEREL_PREVIOUS_STATUS This is the execution result for the previous command. It will be an empty string for the first execution after the agent is launched. Possible values are OK, WARNING, CRITICAL, and UNKNOWN from the second execution onwards. max_check_attempts is not taken into account.
MACKEREL_CHECK_MESSAGE This is the standard output result for the latest command. It may not be possible to obtain a result on Windows.

Examples of environment variable usage

The following is a sample action configuration that executes a service startup or restart command with action.command when MACKEREL_STATUS is not OK.

Linux

action = { command = "bash -c '[ \"$MACKEREL_STATUS\" != \"OK\" ]' && systemctl restart SERVICE" }

Windows

action = { command = "if not %MACKEREL_STATUS% == OK ( net start SERVICE )" }

If the service name contains spaces, an error will occur with the net command if it is written as is, so use action.env and specify it in an environment variable.

action = { command = "if not %MACKEREL_STATUS% == OK ( net start %SERVICE% )", env = { SERVICE = "Sample Service" } }

Check plugin specs

Alert levels are returned as follows based on the exit status of command.

Exit status Alert level
0 OK
1 WARNING
2 CRITICAL
Any value besides 0, 1, or 2 UNKNOWN

A supplementary message may also be added to the standard output. The length of this message may not exceed 1024 characters. The output will be sent to Mackerel and displayed on the Host Details screen and the Alert Details screen. Therefore, please make sure that no confidential information, such as passwords, is sent.

For more information on how to develop check plugins using github.com/mackerelio/checkers, the utility library used for the official check plugin, please refer to Creating a check plugin using checkers.

Check monitoring notifications

Notifications are sent when an alert is generated and when the status of an alert changes after it is generated. This includes cases in which the status becomes OK.

The following is an illustration of changes to an alert status and whether a notification will be sent for each status change.

Check monitor execution attempt 1st 2nd 3rd 4th 5th 6th
Status CRITICAL WARNING WARNING CRITICAL CRITICAL OK
Alert notification Yes Yes No Yes No Yes

In the event of changes to an alert status or the content of a message, this information will also be reflected on the Alert Details screen. No notification will be sent if only the content of a message has changed.

Differences between metric monitoring and check monitoring

Metric monitoring, such as the monitoring of host metrics and service metrics, differs from check monitoring in the following ways.

  • Metric monitoring
    • The host posts metric values to Mackerel, which compares them to the threshold values before generating alert notifications accordingly.
    • Metrics are plotted as graphs.
    • Monitoring rules may be configured via the Web Console or Web API.
    • There is no charge for adding monitoring rules. (Charges apply only to the metrics.)
  • Check monitoring
    • The host where the check plugin is run will make an OK or CRITICAL, WARNING, UNKNOWN decision and send the result to Mackerel. Mackerel will alert you according to the results it receives.
    • No metrics are posts, so no graphs are plotted.
    • Monitoring rules are configured within the mackerel-agent configuration file, and no configuration may be added or modified via the Web Console.
    • Each check monitor item counts as one host metric. Please refer to Pricing for more information on the maximum number of metrics allowed per host. Also, please refer to How usage fees are calculated for more information on the specifications when the limit is exceeded.
  • Illustrations of monitoring -[Left] Metric monitoring / [Right] Check monitoring

An example in Ruby

This is a check plugin that takes the values of a six-sided die as messages, 4 and 5 being a WARNING and 6 being a CRITICAL, and posts them to Mackerel.

#!/usr/bin/env ruby
dice = rand(6)+1
puts "value is #{dice}"
exit (dice >= 6 ? 2 : dice >= 4 ? 1 : 0)

By executing the agent configured with a check plugin, an item showing that monitoring is active will be displayed in the host details page as shown below.

If an alert is raised it will be displayed as shown below and can be confirmed in the alert details page.