Monitors

Register Monitor Configurations

Monitors for various types of metrics as well as external monitors will be registered with Mackerel. The input procedure varies depending on the monitoring target.

POST /api/v0/monitors

Required permissions for the API key

  • Read
  • Write

Host metric monitoring

Input (for host metric monitoring)

KEY TYPE DESCRIPTION
type string constant string "host"
name string arbitrary name that can be referenced from the monitors list, etc.
memo string [optional] notes for the monitoring configuration
duration number average value of the designated interval (in minutes) will be monitored. valid interval (1 to 10 min.)
metric string name of the host metric targeted by monitoring. by designating a specific constant string, comparative monitoring is possible *1
operator string determines the conditions that state whether the designated variable is greater (>) or less than (<). the observed value is on the left of ”>” or ”<” and the designated value is on the right
warning number [optional] the threshold that generates a warning alert. comparative monitoring has a valid range of 1-100*1
critical number [optional] the threshold that generates a critical alert. comparative monitoring has a valid range of 1-100*1
maxCheckAttempts number [optional] number of consecutive Warning/Critical instances before an alert is made. Default setting is 1 (1-10)
notificationInterval number [optional] the time interval (in minutes) for re-sending notifications. If this field is omitted, notifications will not be re-sent.
scopes array[string] [optional] monitoring target’s service name or role details name *2
excludeScopes array[string] [optional] monitoring exclusion target’s service name or role details name *2
isMute boolean [optional] Whether monitoring is muted or not *3
Example Input
{
  "type": "host",
  "name": "disk.aa-00.writes.delta",
  "memo": "This monitor is for Hatena Blog.",
  "duration": 3,
  "metric": "disk.aa-00.writes.delta",
  "operator": ">",
  "warning": 20000.0,
  "critical": 400000.0,
  "maxCheckAttempts": 3,
  "notificationInterval": 60,
  "scopes": [
    "Hatena-Blog"
  ],
  "excludeScopes": [
    "Hatena-Bookmark: db-master"
  ]
}

Response (for host metric monitoring)

Success
{
  "id"  : "2cSZzK3XfmG",
  "type": "host",
  "name": "disk.aa-00.writes.delta",
  "memo": "This monitor is for Hatena Blog.",
  "duration": 3,
  "metric": "disk.aa-00.writes.delta",
  "operator": ">",
  "warning": 20000.0,
  "critical": 400000.0,
  "maxCheckAttempts": 3,
  "notificationInterval": 60,
  "scopes": [
    "Hatena-Blog"
  ],
  "excludeScopes": [
    "Hatena-Bookmark: db-master"
  ]
}

id will be given and returned.

Error
STATUS CODE DESCRIPTION
400 when the input is in a format that can’t be received
400 when the name is empty
400 when the memo exceeds 2048 characters
400 when the duration is outside the range of 1~10
400 when warning or critical are outside the range of 0~100(%) in comparative monitoring settings *1
400 when the maxCheckAttempts is outside the range of 1~10
400 when the service name and role name that are assigned to scope and excludeScopes haven’t been registered yet
400 when the notification re-sending time interval is not set at 10 minutes or more
403 when the API key doesn't have the required permissions / when accessing from outside the permitted IP address range

*1 comparative monitoring

If monitoring host metrics, by assigning a specific character string to metric, comparative monitoring will be done for that metric. metrics that can be assigned as comparative monitoring values are as follows.

metric
"cpu%"
"memory%"
"disk%"
"swap%"
"container-cpu%"
"container-memory%"

*2 Service name and Role service name

Service name as well as role service name are character strings in the format <service name> and <service name>:<role name>.

e.g. If the service name for Hatena-Bookmark is Hatena-Bookmark then the db-master role in the service Hatena-Bookmark would be Hatena-Bookmark:db-master

Usable characters are /^[A-Za-z0-9][A-Za-z0-9_-]+$/.

*3 Muted monitoring

This function disables notifications in monitoring. Alerts occur in response to monitoring thresholds, but notifications will not be sent to notification channels.

Host connectivity monitoring

Input (host connectivity monitoring)

KEY TYPE DESCRIPTION
type string constant string "connectivity"
name string [optional] arbitrary name that can be referenced from the monitors list, etc. The default value is connectivity.
memo string [optional] notes for the monitoring configuration
alertStatusOnGone string [optional] The status of an alert generated by this monitor. Either "CRITICAL" (default) or "WARNING".
scopes array[string] [optional] The service name or role details name of the monitoring target. *2
excludeScopes array[string] [optional] The service name or role details name of the monitoring exception. *2
notificationInterval number [optional] the time interval (in minutes) for re-sending notifications. If this field is omitted, notifications will not be re-sent.
isMute boolean [optional] whether monitoring is muted or not
Example Input
{
  "type": "connectivity",
  "name": "connectivity service1",
  "memo": "A monitor that checks connectivity.",
  "alertStatusOnGone": "WARNING",
  "scopes": [
    "service1"
  ],
  "excludeScopes": [
    "service1: role3"
  ]
}

Response (Host connectivity monitoring)

Success
{
  "id"  : "2cSZzK3XfmG",
  "type": "connectivity",
  "name": "connectivity service1",
  "memo": "A monitor that checks connectivity.",
  "alertStatusOnGone": "WARNING",
  "scopes": [
    "service1"
  ],
  "excludeScopes": [
    "service1: role3"
  ]
}

id will be given and returned

Error
STATUS CODE DESCRIPTION
400 when the input is in a format that can’t be received
400 when the name is empty
400 when thememoexceeds 2048 characters
400 When the alertStatusOnGone is neither CRITICAL nor WARNING
400 when the specified service name or role details name is not registered in scope or excludeScopes
400 when the notification re-sending time interval is not set at 10 minutes or more
403 when the API key doesn't have the required permissions / when accessing from outside the permitted IP address range

Service metric monitoring

Input (when monitoring service metrics)

KEY TYPE DESCRIPTION
type string constant string "service"
name string arbitrary name that can be referenced from the monitors list, etc.
memo string [optional] notes for the monitoring configuration
service string name of the service targeted by monitoring
duration number monitors the average value of the designated number of points. range: most recent 1~10 points
metric string name of the monitoring target’s host metric name
operator string determines the conditions that state whether the designated variable is greater (>) or less than (<). the observed value is on the left of ”>” or ”<” and the designated value is on the right
warning number [optional] the threshold that generates a warning alert
critical number [optional] the threshold that generates a critical alert
maxCheckAttempts number [optional] number of consecutive Warning/Critical instances before an alert is made. Default setting is 1 (1-10)
missingDurationWarning number [optional] the threshold (in minutes) to generate a warning alert for interruption monitoring
missingDurationCritical number [optional] the threshold (in minutes) to generate a critical alert for interruption monitoring
notificationInterval number [optional] the time interval (in minutes) for re-sending notifications. If this field is omitted, notifications will not be re-sent.
isMute boolean [optional] Whether monitoring is muted or not *3
Example Input
{
  "type": "service",
  "name": "Hatena-Blog - access_num.4xx_count",
  "memo": "A monitor that checks the number of 4xx for Hatena Blog",
  "service": "Hatena-Blog",
  "duration": 1,
  "metric": "access_num.4xx_count",
  "operator": ">",
  "warning": 50.0,
  "critical": 100.0,
  "maxCheckAttempts": 3,
  "missingDurationWarning": 360,
  "missingDurationCritical": 720,
  "notificationInterval": 60
}

Response (when monitoring service metrics)

Success
{
  "id"  : "2cSZzK3XfmG",
  "type": "service",
  "name": "Hatena-Blog - access_num.4xx_count",
  "memo": "A monitor that checks the number of 4xx for Hatena Blog",
  "service": "Hatena-Blog",
  "duration": 1,
  "metric": "access_num.4xx_count",
  "operator": ">",
  "warning": 50.0,
  "critical": 100.0,
  "maxCheckAttempts": 3,
  "missingDurationWarning": 360,
  "missingDurationCritical": 720,
  "notificationInterval": 60
}

id will be given and returned.

Error
STATUS CODE DESCRIPTION
400 when the input is in a format that can’t be received
400 when the name is empty
400 when the memo exceeds 2048 characters
400 when the duration is not in the range of 1~10
400 when the maxCheckAttempts is not in the range of 1~10
400 when the missingDurationWarning or missingDurationCritical is not a multiple of 10 minutes, or is more than a week
400 when the service name assigned to the service hasn’t been registered yet
400 when the notification re-sending time interval is not set at 10 minutes or more
403 when the API key doesn't have the required permissions / when accessing from outside the permitted IP address range

External monitoring

Input (external monitoring)

KEY TYPE DESCRIPTION
type string constant string "external"
name string arbitrary name that can be referenced from the monitors list, etc.
memo string [optional] notes for the monitoring configuration
url string monitoring target URL
method string [optional] request method, one of GET, POST, PUT, DELETE. If omitted, GET method is used.
service string [optional] service name (when response time is monitored, it will be graphed in the service metrics of the service linked here)
notificationInterval number [optional] the time interval (in minutes) for re-sending notifications. If this field is omitted, notifications will not be re-sent.
responseTimeWarning number [optional] the response time threshold for Warning alerts (in milliseconds) service designation is required
responseTimeCritical number [optional] the response time threshold for Critical alerts (in milliseconds) service designation is required
responseTimeDuration number [optional] will monitor the avg. value of requests in the designated time frame (1-10 min.). service designation is required
containsString string [optional] string which should be contained by the response body
maxCheckAttempts number [optional] number of consecutive Warning/Critical instances before an alert is made. Default setting is 1 (1-10)
certificationExpirationWarning number [optional] certification expiration date monitor’s “Warning” threshold. number of days remaining until expiration.
certificationExpirationCritical number [optional] certification expiration date monitor’s “Critical” threshold. number of days remaining until expiration.
skipCertificateVerification boolean [optional] Whether or not to skip the verification of the certificate.
isMute boolean [optional] Whether monitoring is muted or not *3
headers array[object] [optional] The values that should be configured as the HTTP request header specified by name and value. If this field is omitted, the default header will be configured. If you do not want to configure headers, specify an empty array.
requestBody string [optional] HTTP request body
followRedirect boolean [optional] Evaluates the response of the redirector as a result. If this field is omitted, the redirection destination in the response will not be tracked.

In order to monitor response time, it's necessary to specify responseTimeDuration and at least one of responseTimeWarning and responseTimeCritical. In order to monitor the certification expiration date, it’s necessary to specify at least one of certificationExpirationWarning and certificationExpirationCritical.

Example Input
{
  "type": "external",
  "name": "Example Domain",
  "memo": "Monitors example.com",
  "method": "GET",
  "url": "https://example.com",
  "service": "Hatena-Blog",
  "notificationInterval": 60,
  "responseTimeWarning": 5000,
  "responseTimeCritical": 10000,
  "responseTimeDuration": 3,
  "containsString": "Example",
  "maxCheckAttempts": 3,
  "certificationExpirationWarning": 90,
  "certificationExpirationCritical": 30,
  "isMute": false,
  "headers": [{"name": "Cache-Control", "value": "no-cache"}]
}

Response (external monitoring)

Success
{
  "id"  : "2cSZzK3XfmG",
  "type": "external",
  "name": "example.com",
  "memo": "Monitors example.com",
  "method": "GET",
  "url": "https://example.com",
  "service": "Hatena-Blog",
  "notificationInterval": 60,
  "responseTimeWarning": 5000,
  "responseTimeCritical": 10000,
  "responseTimeDuration": 3,
  "containsString": "Example",
  "maxCheckAttempts": 3,
  "certificationLimitWarning": 90,
  "certificationLimitCritical": 30,
  "isMute": false,
  "headers": [{"name": "Cache-Control", "value": "no-cache"}]
}

id will be given and returned.

Error
STATUS CODE DESCRIPTION
400 when the input is in a format that can’t be received
400 when the name is empty
400 when the memo exceeds 2048 characters
400 when the url scheme is not http or https
400 when the notification re-sending time interval is not set at 10 minutes or more
400 when the maxCheckAttempts is not in the range of 1~10
403 when the API key doesn't have the required permissions / when accessing from outside the permitted IP address range

Expression monitoring

Input(expression monitoring)

KEY TYPE DESCRIPTION
type string constant string "expression"
name string arbitrary name that can be referenced from the monitors list, etc.
memo string [optional] notes for the monitoring configuration
expression string Expression of the monitoring target. Only valid for graph sequences that become one line.
operator string determines the conditions that state whether the designated variable is greater (>) or less than (<). the observed value is on the left of ”>”or ”<” and the designated value is on the right
warning number [optional] the threshold that generates a warning alert
critical number [optional] the threshold that generates a critical alert
notificationInterval number [optional] The time interval (in minutes) for re-sending notifications. If this field is omitted, notifications will not be re-sent.
isMute boolean [optional] whether monitoring is muted or not *3
Input example
{
  "type": "expression",
  "name": "role average",
  "memo": "Monitors the average of loadavg5",
  "expression": "avg(roleSlots(\"service:role\",\"loadavg5\"))",
  "operator": ">",
  "warning": 5.0,
  "critical": 10.0,
  "notificationInterval": 60
}

Response(expression monitoring)

Success
{
  "id"  : "2cSZzK3XfmG",
  "type": "expression",
  "name": "role average",
  "memo": "Monitors the average of loadavg5",
  "expression": "avg(roleSlots(\"service:role\",\"loadavg5\"))",
  "operator": ">",
  "warning": 5.0,
  "critical": 10.0,
  "notificationInterval": 60
}

id will be given and returned.

Error
STATUS CODE DESCRIPTION
400 when the input is in a format that can’t be received
400 when the name is empty
400 when the memo exceeds 2048 characters
400 when the notification re-sending time interval is not set at 10 minutes or more
400 when an invalid expression is designated
403 when the API key doesn't have the required permissions / when accessing from outside the permitted IP address range

Monitoring with Anomaly Detection for Roles

Input (when monitoring with Anomaly Detection for Roles)

KEY TYPE DESCRIPTION
type string constant string "anomalyDetection"
name string arbitrary name that can be referenced from the monitors list, etc.
memo string [optional] notes for the monitoring configuration
scopes array[string] [optional] monitoring target’s service name and role details name *2
warningSensitivity string [optional] the sensitivity (insensitive, normal, or sensitive) that generates warning alerts.
criticalSensitivity string [optional] the sensitivity (insensitive, normal, or sensitive) that generates critical alerts.
maxCheckAttempts number [optional] number of consecutive Warning/Critical instances before an alert is made. Default setting is 3 (1-10)
trainingPeriodFrom number [optional] Specified training period (Uses metric data starting from the specified time)
notificationInterval number [optional] the time interval (in minutes) for re-sending notifications. If this field is omitted, notifications will not be re-sent.
isMute boolean [optional] whether monitoring is muted or not
Example Input
{
  "type": "anomalyDetection",
  "name": "anomaly detection",
  "memo": "my anomaly detection for roles",
  "scopes": [
    "myService: myRole"
  ],
  "warningSensitivity": "insensitive",
  "maxCheckAttempts": 3
}

Response (Monitoring with Anomaly Detection for Roles)

Success
{
  "id"  : "2cSZzK3XfmG",
  "type": "anomalyDetection",
  "name": "anomaly detection",
  "memo": "my anomaly detection for roles",
  "scopes": [
    "myService: myRole"
  ],
  "warningSensitivity": "insensitive",
  "maxCheckAttempts": 3
}

id will be given and returned

Error
STATUS CODE DESCRIPTION
400 when the input is in a format that can’t be received
400 when the name is empty
400 when thememoexceeds 2048 characters
400 when the specified service name or role details name is not registered in scope or excludeScopes
400 when the specified warningSensitivity or criticalSensitivity is not insensitive / normal / sensitive
400 when both of the warningSensitivity and criticalSensitivity are unspecified
400 when the notification re-sending time interval is not set at 10 minutes or more
400 when a future value is specified for trainingPeriodFrom
403 when the API key doesn't have the required permissions

Query monitoring

Input (query monitoring)

KEY TYPE DESCRIPTION
type string constant string "query"
name string arbitrary name that can be referenced from the monitors list, etc.
memo string [optional] notes for the monitoring configuration
query string query of the monitoring target
legend string graph legend for the alerts
operator string determines the conditions that state whether the designated variable is greater (>) or less than (<). the observed value is on the left of ”>”or ”<” and the designated value is on the right
warning number the threshold that generates a warning alert
critical number the threshold that generates a critical alert
notificationInterval number [optional] the time interval (in minutes) for re-sending notifications. if this field is omitted, notifications will not be re-sent.
isMute boolean [optional] whether monitoring is muted or not *3
Example Input
{
  "type": "query",
  "name": "cpu utilization",
  "memo": "Monitors the cpu utilization of httpbin",
  "query": "container.cpu.utilization{k8s.deployment.name=\"httpbin\"}",
  "legend": "cpu.utilization {{k8s.node.name}}",
  "operator": ">",
  "warning": 70.0,
  "critical": 90.0,
  "notificationInterval": 60
}

Response (query monitoring)

Success
{
  "id"  : "2cSZzK3XfmG",
  "type": "query",
  "name": "cpu utilization",
  "memo": "Monitors the cpu utilization of httpbin",
  "query": "container.cpu.utilization{k8s.deployment.name=\"httpbin\"}",
  "legend": "cpu.utilization {{k8s.node.name}}",
  "operator": ">",
  "warning": 70.0,
  "critical": 90.0,
  "notificationInterval": 60
}

id will be given and returned.

Error
STATUS CODE DESCRIPTION
400 when the input is in a format that can’t be received
400 when the name is empty
400 when the memo exceeds 2048 characters
400 when the notification re-sending time interval is not set at 10 minutes or more
400 when an invalid query is designated
403 when the API key doesn't have the required permissions / when accessing from outside the permitted IP address range

List Monitor Configurations

GET /api/v0/monitors

Required permissions for the API key

  • Read

Response

{
  "monitors": [
    {
      "id"  : "2cSZzK3XfmB",
      "type": "host",
      "name": "disk.aa-00.writes.delta",
      "memo": "This monitor is for Hatena Blog.",
      "duration": 3,
      "metric": "disk.aa-00.writes.delta",
      "operator": ">",
      "warning": 20000.0,
      "critical": 400000.0,
      "maxCheckAttempts": 3,
      "scopes": [
        "Hatena-Blog"
      ],
      "excludeScopes": [
        "Hatena-Bookmark: db-master"
      ]
    },
    {
      "id": "2cSZzK3XfmA",
      "type": "connectivity",
      "alertStatusOnGone": "CRITICAL",
      "scopes": [],
      "excludeScopes": []
    },
    {
      "id"  : "2cSZzK3XfmC",
      "type": "service",
      "name": "Hatena-Blog - access_num.4xx_count",
      "memo": "A monitor that checks the number of 4xx for Hatena Blog",
      "service": "Hatena-Blog",
      "duration": 1,
      "metric": "access_num.4xx_count",
      "operator": ">",
      "warning": 50.0,
      "critical": 100.0,
      "maxCheckAttempts": 1,
      "notificationInterval": 60
    },
    {
      "id"  : "2cSZzK3XfmD",
      "type": "external",
      "name": "example.com",
      "memo": "Monitors example.com",
      "url": "http://www.example.com",
      "service": "Hatena-Blog",
      "headers": [{"name":"Cache-Control", "value":"no-cache"}],
      "maxCheckAttempts": 1
    },
    {
      "id"  : "2cSZzK3XfmE",
      "type": "expression",
      "name": "role average",
      "memo": "Monitors the average of loadavg5",
      "expression": "avg(roleSlots(\"server:role\",\"loadavg5\"))",
      "operator": ">",
      "warning": 5.0,
      "critical": 10.0,
      "notificationInterval": 60
    }
  ]
}
  • each field is the same as when the monitor was created
  • list is ordered as monitor type -> name (same as the list of monitors on mackerel.io)

Get Monitor Configurations

GET /api/v0/monitors/<monitorId>

Required permissions for the API key

  • Read

Response

{
  "monitor": {
    "id"  : "2cSZzK3XfmB",
    "type": "host",
    "name": "disk.aa-00.writes.delta",
    "memo": "This monitor is for Hatena Blog.",
    "duration": 3,
    "metric": "disk.aa-00.writes.delta",
    "operator": ">",
    "warning": 20000.0,
    "critical": 400000.0,
    "maxCheckAttempts": 3,
    "scopes": [
      "Hatena-Blog"
    ],
    "excludeScopes": [
      "Hatena-Bookmark: db-master"
    ]
  }
}

Update Monitor Configurations

PUT /api/v0/monitors/<monitorId>

As for requests and responses, just as when create monitors, every field must be specified. If there are any insufficient items that are required, an error will be generated. When scopes and excludeScopes are updated, the JSON which was designated will be completely overwritten. For example, by omitting an item in scopes when it has already been saved, scopes will be deleted.

Connectivity Monitoring

When changing the alertStatusOnGone field, alerts generated by that monitor prior to the change will be affected as follows:

  • Notifications configured to be resent (notificationInterval)

    After alertStatusOnGone has been changed, only notifications that are configured for resending will change to the new alert status once resent.

  • Notifications not configured to be resent

    The alert status will not change.

Additionally, if the alertStatusOnGone field is not specified, its value will not be updated.

External Monitoring

If the headers field is not specified, its value will not be updated. If you would like to delete the header settings, specify an empty array.

Required permissions for the API key

  • Read
  • Write

Response

Success

The updated monitoring configurations are returned. The same format as Register Monitor Configurations.

Error

same errors as when creating.

STATUS CODE DESCRIPTION
400 when trying to change the type
400 when the name is empty
400 when the memo exceeds 2048 characters
404 when the monitor configuration doesn’t have a saved <monitorId> which was assigned to the query parameter
403 when the API key doesn't have the required permissions / when accessing from outside the permitted IP address range

Delete Monitor Configurations

DELETE /api/v0/monitors/<monitorId>

Required permissions for the API key

  • Read
  • Write

Response

Success

The status of the monitor configuration just before it is deleted will be returned. The format will be the same as when it was created.

Error

STATUS CODE DESCRIPTION
404 when the monitor configuration doesn’t have a saved <monitorId> which was assigned to the query parameter
403 when the API key doesn't have the required permissions / when accessing from outside the permitted IP address range