Mackerel blog #mackerelio

The Official Blog of Mackerel

Maintenance completion notice + an apology in relation to data loss and custom dashboard/expression monitoring failures

I’m id:Songmu, Mackerel’s sub-producer. Previously announced maintenance began today (Monday, August 8th) at 2:30 p.m. (JST) and was completed at 6:00 p.m. Thank you to all our users for your cooperation.

In this entry, we’ll report on some of the details related to this event. All the times listed below will be in Japan Standard Time.

Period of no-access

The entire system was inaccessible from 2:30-2:50 p.m. as well as 3:30-3:33 p.m.

Regarding data loss

During maintenance, an unexpected loss of data occurred and we sincerely apologize for any inconvenience this may have caused.

Data created in the Mackerel database during the time of 2:30-3:31 p.m. excluding times series data, was lost. Specifically, data including alert and host information registered during the above time period was lost.

Regarding the cause of the incident and our response

The cause of this incident was due to an unintended database failover during the time of maintenance. Data was lost from the start of maintenance until when the failover occurred. Originally, preventative measures were taken so that data would not be lost if a failover occurred, but failed due to a unique circumstance while work was in-progress.

Attempts were made to recover said data, but due to difficulty in maintaining consistency with the data updated after failover, the recovery was abandoned.

After the failover, we carefully reexamined the redundant configuration of the database and made sure that a similar problem would not occur. Technical details regarding this matter will be posted at another time.

Regarding work requests for hosts registered during the period of data loss

Hosts that were newly registered with mackerel-agent during the above mentioned time period (2:30-3:31 p.m.) can not post data correctly. We’re sorry for the inconvenience, but please re-register the host.

Specifically, delete the id file ( /var/lib/mackerel-agent/id ), then restart mackerel-agent.

Regarding custom dashboard failures (fixed 8/8)

Graph display problems have occurred for custom dashboards created or updated since the start of maintenance. This issue is scheduled to be fixed by Tuesday, August 8th.

(Fixed 8/8) Corrections were complete with this release on Tuesday, August 8th.

Regarding expression monitoring failures (fixed 8/8)

We have confirmed the occurrence of expression monitoring failures in several organizations after the maintenance. We sincerely apologize, but we are unable to respond immediately. Individual announcements will be made to the owners of the respective organizations.

(Fixed 8/8) Corrections were made and the situation resolved with this release on Tuesday, August 8th.

Again, we apologize for any inconvenience this may have caused our users. We are committed to further improving our quality of service in the future and we thank you for your patience and appreciate your continued use of Mackerel.