Announcements

#mackerelio

Extract Metrics from Logs in Amazon CloudWatch Logs Using cloudwatch-logs-aggregator

Hello, I’m id:susisu, an Application Engineer on the Development Team for Mackerel.

In this article, I would like to introduce the process of using cloudwatch-logs-aggregator to aggregate application logs that have been produced as output in Amazon CloudWatch Logs so that they can be posted to Mackerel as metrics to enable monitoring and visualizing, along with examples of how to use this process practically.

The goal is to extract metrics from application logs such as the following and post them to Mackerel.

{
  "level": "info",
  "msg": "query complete",
  "bytes_scanned": 51828,
  "records_scanned": 246,
  "records_matched": 120,
  ...
}

Example of the desired output

What Is cloudwatch-logs-aggregator

cloudwatch-logs-aggregator is a function provided by Mackerel that runs on AWS Lambda. It aggregates logs that have been produced as output in CloudWatch Logs and posts them as service metrics to Mackerel. This Lambda function is built on the user’s AWS account and accompanied by Terraform modules for doing so.

By utilizing the mechanisms of the logs, you can easily start to monitor and visualize metrics for the application.

Architecture diagram

For details regarding the functions of cloudwatch-logs-aggregator, please read the following document and the README in the repository.

https://mackerel.io/docs/entry/advanced/cloudwatch-logs-aggregator

Aggregate Logs for cloudwatch-logs-aggregator Itself

I will now go ahead and introduce ways to actually use cloudwatch-logs-aggregator, but in order to do so, we need an application that produces logs for us to aggregate.

In these examples, instead of using the user’s own application, we will assume that cloudwatch-logs-aggregator has already been built and aim to build another cloudwatch-logs-aggregator that will aggregate its logs.

cloudwatch-logs-aggregator produces 2 types of logs as output.

The first type consists of error logs. For example, when a CloudWatch Logs Insights query fails, logs like the following would be produced as output. (I have shaped the logs, such as by adding line breaks, so that they are easier to read.)

{
  "level": "error",
  "msg": "failed to query: ...",
  ...
}

The other type consists of information logs. For example, when a CloudWatch Logs Insights query is successful, the following pieces of data, such as the amount of logs that were scanned, would be produced as output.

{
  "level": "info",
  "msg": "query complete",
  "bytes_scanned": 51828,
  "records_scanned": 246,
  "records_matched": 120,
  ...
}

Aggregating Error Logs

First, we will aggregate error logs (logs that are produced with "level": "error").

{
  "level": "error",
  "msg": "failed to query: ...",
  ...
}

Here, we will simply post the number of error logs as the error count metric to Mackerel. If the number of errors in the application can be obtained as a metric, it would be possible to set up monitoring rules for automated and continuous monitoring or check the frequency and trends of errors through visualization using graphs.

We will take the following steps to set up cloudwatch-logs-aggregator to aggregate logs.

  1. Prepare the Mackerel Service to Post Metrics To
  2. Save the Mackerel API Key in Parameter Store of AWS Systems Manager
  3. Create the cloudwatch-logs-aggregator Lambda Function
  4. Create a Query that Aggregates the Logs
  5. Create an Amazon EventBridge Rule to Invoke the Lambda Function

1. Prepare the Mackerel Service to Post Metrics To

Since cloudwatch-logs-aggregator posts the results of log aggregation as service metrics to Mackerel, there is a need to prepare a destination Mackerel service beforehand.

Select the Mackerel service to post metrics to from the list or create a new service.

In the following examples, we will use a service named my-service as the destination for where the metrics are posted to.

2. Save the Mackerel API Key in Parameter Store of AWS Systems Manager

We will prepare the Mackerel API key that cloudwatch-logs-aggregator will use to post service metrics.

Obtain an API key that has Write access from the list of Mackerel API keys or newly create one and save it in Parameter Store of AWS Systems Manager. In this instance, I would recommend setting the parameter type as a SecureString to store the API key in an encrypted form.

In the following examples, we will assume that the API key is saved as a parameter named /mackerel/myApiKey in Parameter Store.

3. Create the cloudwatch-logs-aggregator Lambda Function

We will create the Lambda function that serves as the main component of cloudwatch-logs-aggregator.

To create the function, we will use a Terraform module that accompanies cloudwatch-logs-aggregator. We will create main.tf, which is a Terraform settings file, and fill in the following configuration.

module "cw_logs_aggregator_lambda" {
  source = "github.com/mackerelio-labs/mackerel-monitoring-modules//cloudwatch-logs-aggregator/lambda?ref=v0.1.2"

  region        = "ap-northeast-1"
  function_name = "cw-logs-aggregator-demo"
  iam_role_name = "cw-logs-aggregator-demo-lambda"
}

When you execute the following command, the Lambda function for cloudwatch-logs-aggregator and related resources, such as an IAM role, will be created.

terraform init
terraform apply

This completes the process of creating the Lambda function.

We have created the Lambda function cw-logs-aggregator-demo

4. Create a Query that Aggregates the Logs

We will create a CloudWatch Logs Insights query to count the number of error logs that have been produced as output.

Here, we use the filter command to filter the error logs and the stats command to count them. Please refer to the documentation provided by AWS regarding the query syntax.

filter level = "error"
| stats count(*) as `error_count`

When you run this query, the number of error logs will be aggregated as error_count. cloudwatch-logs-aggregator will post metrics to Mackerel using this name and value.

Before we set the query to run on cloudwatch-logs-aggregator, let’s validate it beforehand. You can validate a query on the AWS console. (If there are no corresponding logs, the result will be empty.)

Example result of the query

It seems to be working properly.

5. Create an Amazon EventBridge Rule to Invoke the Lambda Function

Now that we have created the query, we will set a rule to invoke the Lambda function so that cloudwatch-logs-aggregator runs the query that we created.

We will also use an accompanying Terraform module to set this up. We will add the following information to main.tf, the same file that we used to create the Lambda function earlier.

module "cw_logs_aggregator_rule_error_count" {
  source = "github.com/mackerelio-labs/mackerel-monitoring-modules//cloudwatch-logs-aggregator/rule?ref=v0.1.2"

  region    = "ap-northeast-1"
  rule_name = "cw-logs-aggregator-demo-error-count"
  # ARN for the Lambda function created above
  function_arn = module.cw_logs_aggregator_lambda.function_arn

  # Parameter Store parameter name for the saved Mackerel API key
  api_key_name = "/mackerel/myApiKey"
  # Mackerel service name to post the metrics to
  service_name = "my-service"

  # CloudWatch Logs log group name to search
  # Here, we will search logs from another cloudwatch-logs-aggregator
  log_group_name = "/aws/lambda/cw-logs-aggregator-target"
  # Query created above
  query = <<EOT
    filter level = "error"
    | stats count(*) as `error_count`
  EOT
  # Prefix for the name of the metric
  # In this instance, the name of the metric will be cloudwatch_logs_aggregator.log.error_count
  metric_name_prefix = "cloudwatch_logs_aggregator.log"
  # Default value to use when there is not a single error log
  default_metrics = {
    "cloudwatch_logs_aggregator.log.error_count" = 0
  }
  # Execution interval (execute in 1-minute intervals)
  schedule_expression = "rate(1 minute)"
  interval_in_minutes = 1
}

When you run the following command, various resources to invoke the cloudwatch-logs-aggregator Lambda function will be created.

terraform init
terraform apply

Validation

The number of error logs for cloudwatch-logs-aggregator should be posted as metrics now, so let’s check the results on Mackerel.

On the Mackerel Web console, display the service to post the metrics to and select the “Service Metrics” tab.

Graph of cloudwatch_logs_aggregator.log.error_count

Now we can see that the metrics have been posted without any problems and that the graph is being displayed.

The steps to set up monitoring rules for the metrics or to display graphs in custom dashboards are the same as for service metrics in general.

Aggregate the Execution Logs

Next, we will aggregate the information logs (logs that produced with "level": "info"). More specifically, we will aggregate logs that are produced when CloudWatch Logs Insights queries have been completed ("msg": "query complete").

{
  "level": "info",
  "msg": "query complete",
  "bytes_scanned": 51828,
  "records_scanned": 246,
  "records_matched": 120,
  ...
}

Metrics like these that are based on information logs are very useful in grasping the execution state of the application by going beyond simply detecting whether or not there are errors. For example, since the CloudWatch Logs Insights usage fees are determined by the number of bytes that were scanned by the query, obtaining these types of information as metrics makes it possible to monitor and estimate costs.

For our example, let’s aggregate the total value of the bytes_scanned field that is produced in the logs as output and post it to Mackerel.

Although we will set cloudwatch-logs-aggregator up in a similar manner as we did for the error logs, since we can reuse the same Mackerel service, API key, and Lambda function, we can start at step 4 this time.

  1. Prepare the Mackerel Service to Post Metrics To
  2. Save the Mackerel API Key in Parameter Store of AWS Systems Manager
  3. Create the cloudwatch-logs-aggregator Lambda Function
  4. Create a Query that Aggregates the Logs
  5. Create an Amazon EventBridge Rule to Invoke the Lambda Function

4. Create a Query that Aggregates the Logs

We will create a CloudWatch Logs Insights query that calculates the total value of the bytes_scanned field contained in the "msg": "query complete" information logs.

The query for doing so is listed below. What differs from the query for the error logs is the filter conditions and that we are using the sum function instead of count, as we are calculating the total value. Although a ~ has been added to the field name that represents the total value, since the stats command does not allow a field to be created if it has the same name as an existing field, this will not be included in the name of the metric that is posted to Mackerel.

filter level = "info" and msg = "query complete"
| stats sum(bytes_scanned) as `~bytes_scanned`

When you run this query, the total value of the bytes_scanned field will be aggregated with the name ~bytes_scanned. Let’s validate it on the AWS console, just like we did earlier.

Example result of the query

It seems that the total value is being calculated properly.

5. Create an Amazon EventBridge Rule to Invoke the Lambda Function

We will add the following information to main.tf, the same file that we used to create the Lambda function earlier, just like we did for the error logs.

module "cw_logs_aggregator_rule_bytes_scanned" {
  source = "github.com/mackerelio-labs/mackerel-monitoring-modules//cloudwatch-logs-aggregator/rule?ref=v0.1.2"

  region       = "ap-northeast-1"
  rule_name    = "cw-logs-aggregator-demo-bytes-scanned"
  function_arn = module.cw_logs_aggregator_lambda.function_arn

  api_key_name = "/mackerel/myApiKey"
  service_name = "my-service"

  log_group_name     = "/aws/lambda/cw-logs-aggregator-target"
  query              = <<EOT
    filter level = "info" and msg = "query complete"
    | stats sum(bytes_scanned) as `~bytes_scanned`
  EOT
  metric_name_prefix = "cloudwatch_logs_aggregator.query"
  default_metrics = {
    "cloudwatch_logs_aggregator.query.bytes_scanned" = 0
  }
  schedule_expression = "rate(1 minute)"
  interval_in_minutes = 1
}

Based on the configuration added, we will run the following command and create various resources to invoke the Lambda function.

terraform init
terraform apply

Validation

Now, the number of bytes scanned by cloudwatch-logs-aggregator are being posted as metrics. Let’s check the results from the Mackerel service metrics screen, just like we did for the error logs.

Graph of cloudwatch_logs_aggregator.query.bytes_scanned

We have confirmed that this is also being posted properly.

Summary

I have introduced how to use cloudwatch-logs-aggregator to actually aggregate the logs in CloudWatch Logs to extract metrics to be monitored and visualized on Mackerel.

As we have seen, when we use cloudwatch-logs-aggregator, we can monitor the metrics of the application without a significant amount of effort just by writing the application’s logs in an appropriate form to CloudWatch Logs. I hope that you will add this to your tool belt as an option to monitor applications using Mackerel.

Please also read this article (in Japanese) about utilizing the metrics that you have obtained using cloudwatch-logs-aggregator for reference.