Monitoring PostgreSQL on AWS RDS for slow queries

Databases serve as the backbone for countless applications and systems that drive modern businesses. Efficient database performance is critical for ensuring seamless user experiences, rapid response times, and overall system reliability. However, even the most robust database systems can face performance bottlenecks, and one of the key indicators of potential trouble can be premptively identified when we start seeing slow queries.

Slow queries, can have far-reaching consequences on the performance and responsiveness of an entire system. In the digital age, where milliseconds matter, the importance of monitoring slow queries cannot be overstated. Proactively identifying a slow query before it becomes a bottleneck can help prevent downtimes before they happen and also help us identify queries that need to be optimized.

We can pick up slow queries that ran for more than 5 mins, and deliver them to any endpoint for subsequent review and intervention if necessary, possible endpoints include AWS SNS, Sentry, Slack etc.

We will now look into how to setup a slow query monitor for a PostgreSQL database on AWS RDS, deployed using thoughtbot’s opensource RDS terraform modules.

AWS RDS Slow query monitor terraform module architecture

Step 1 - Enable query logging for PostgreSQL on AWS RDS.

Set the log_statement parameter

The first step to setting up slow query logs is to enable query logging on the PostgreSQL RDS instance. You can enable query logs on RDS by setting the log_statement parameter for PostgreSQL.

The log_statement parameter is used to determine which type of SQL statements to be logged regardless of execution time. The default is ‘none’. Possible options are;

  • none: Do not capture any statements.
  • all: Capture all statements.
  • ddl: Capture all data definition language (DDL) statements such as CREATE, ALTER, and DROP.
  • mod: Capture all DDL and data modification language (DML) statements such as INSERT, UPDATE, and DELETE.

Set the logminduration_statement parameter

The second interesting parameter for PostgreSQL necessary for slow query logging is logminduration_statement. This parameter enables us to set a threshold in milliseconds for SQL statements to be logged. This will allow us to log all SQL statements that take longer than the set parameter value. In this example, we will be using 5000 milliseconds to log all SQL queries running for more than 5 seconds.

The terraform code snippet below using the awsdbparameter_group terraform resource can be used to create a PostgreSQL database parameter group to export all SQL statements that ran for more than 5 seconds for an RDS instance.

resource "aws_db_parameter_group" "postgres_databae_parameter" {
  name   = "postgres-database-parameter"
  family = "postgres14"

  parameter {
    name  = "log_statement"
    value = "all"
  }

  parameter {
    name  = "log_min_duration_statement"
    value = "5000"
  }
}

We can now use the awsdbparametergroup defined above to create the PostgreSQL instance uring the awsdb_instance terraform resource.

The terraform code snippet below can be used to create a PostgreSQL instance while referencing the awsdbparameter_group terraform resource. It will enable logging all SQL statements that ran for more than 5 seconds for an RDS instance.

resource "aws_db_instance" "default" {
  name                 = "sandbox-database"
  ...
  parameter_group_name = "postgres-database-parameter"
  ...
}

Step 2 - Pushing the selected slow SQL statements to Cloudwatch Logs:

Once SQL statement logging has been enabled in the RDS postgres instance, the next step is to publish the SQL statement data to Cloudwatch logs. To do this, you should set enabledcloudwatchlogs_exports option on RDS to the preferred RDS log type that should be exported.

Available options for PostgreSQL are postgresql and upgrade. For slow query logs, we will set this parameter to postgresql.

The terraform code snippet below using the awsdbinstance terraform resource can be used to enable logs export to a cloudwatch log group for a PostgreSQL database instance.

resource "aws_db_instance" "default" {

  ...
  enabled_cloudwatch_logs_exports = ["postgresql"]
  ...
}

The logs from the RDS instance will be exported to a cloudwatch log group with this naming convention - /aws/rds/instance/{myinstance}/postgresql, where myinstance is the name of the RDS instance. A cloudwatch log group with this name will be created automatically if doesn’t already exist.

Step 3 - Create a Cloudwatch log group subscription to AWS SNS

You can create a Log group subscription to send every message received by the Cloudwatch log group to an AWS SNS topic. For this step, we will be using thoughtbot’s cloudwatch logs extract terraform module. You can review the terraform code snippet below for a sample implementation of the module.

Note: A destination SNS topic will be created as well and passed into this terraform module.

resource "aws_sns_topic" "rds_statement_logs" {
  name = "postgresql-slow-statements-topic"
}


module "cloudwatch_log_extract" {
  source = "github.com/thoughtbot/flightdeck//aws/cloudwatch-log-extract?ref=v0.10.1"

  # Enter the Source CloudWatch log group to subscribe to for log messages.
  source_cloudwatch_log_group = "/aws/rds/instance/sandbox-database/postgresql"

  # Enter a log group filter pattern to pick which logs get sent to the lambda endpoint
  log_group_filter_pattern = "duration"

  # Enter a log message filter expression to pick out specific details to be sent to the destination SNS topic.
  # You may provide a regex pattern with capture groups, and enter the label for each capture group.
  log_message_filter = {
    regex_pattern = ".* duration: ([0-9.]+) ms .* content: (.*)"
    capture_group = {
      1 = "duration",
      2 = "message"
    }
  }

  # Enter any message attribute to be added to the SNS message.
  message_attributes = {
    database = "sandbox-database"
  }

  # Enter the destination SNS topic
  destination_sns_topic_arn = aws_sns_topic.rds_statement_logs.arn
}

Once a slow query statement is detected, it will be logged and written to the PostgreSQL cloudwatch group, which will then get extracted to the SNS topic. You can then have a subscription to the SNS topic send the message to any preferred endpoint. We will share an example on how to share the message from the SNS topic to sentry using thoughtbot’s sns to sentry delivery terraform module.

Step 4 - Create a subscription from AWS SNS to Sentry.

Lastly, we will push the messages from SNS to Sentry. You can do this by creating an SNS subscription to the destination SNS topic used in the previous step. Once this is done, any message publised to SNS will be forwarded to Sentry. AWS SNS does not natively support sentry as a subscription endpoint, so for this step, we will be using thoughtbot’s SNS to Sentry integration terraform module. The module will be using a lambda script to pick up messages from AWS SNS and deliver them to Sentry.

The module will require your SENTRY_DSN key to be stored in AWS SecretsManager. You can run the AWS CLI command below to safely upload your sentry dsn secret to AWS secretsmanager.

export YOUR_SECRET_VALUE="your_actual_sentry_dsn_value"

aws secretsmanager create-secret --name YourSecretName \
  --secret-string "{\"SENTRY_DSN\":\"$YOUR_SECRET_VALUE\"}"

You can review the terraform code snippet below for a sample implementation of the SNS to Sentry integration terraform module.

module "sns_sentry_delivery" {
  source = "github.com:thoughtbot/aws-sns-sentry-delivery?ref=v0.1.1"

  name = "sandbox-database-statement-delivery"

  # Name of the AWS Secretmanager resource contaiing the sentry DSN credentials,
  # the secret key in the vault should be SENTRY_DSN
  sentry_secret_name = sandbox-sentry-secret

  # Enter the source SNS topic arn
  source_sns_topic_arn = aws_sns_topic.rds_statement_logs.arn
}

Conclusion

In essence, slow query logs are crucial for identifying potential database issues, but their true impact is realized through swift action. Recognizing and resolving long-running SQL statements promptly, with collaborative efforts between SREs and developers, ensures optimal database performance and fortifies the application ecosystem. This proactive approach is essential in meeting user expectations, maintaining responsiveness, and establishing a foundation for sustained growth in today’s dynamic technology landscape.