Monitoring OpenSearch with AWS CloudWatch: Complete Setup Guide

Amazon OpenSearch Service provides built-in integration with AWS CloudWatch for monitoring cluster health, performance, and operational metrics. While some metrics are provided out of the box, they are often not sufficient.

This guide walks you through setting up a complete monitoring and alerting solution using CloudWatch to ensure your OpenSearch clusters remain healthy and performant.

Prerequisites

Before setting up monitoring, ensure you have:

  1. AWS Account Access: Administrative or appropriate IAM permissions for CloudWatch and OpenSearch Service
  2. OpenSearch Domain: An active Amazon OpenSearch Service domain
  3. IAM Permissions: The following permissions for your user/role:
    • cloudwatch:PutMetricData
    • cloudwatch:GetMetricData
    • cloudwatch:PutDashboard
    • cloudwatch:DescribeAlarms
    • es:DescribeElasticsearchDomain
    • es:ListTags

Step 1: Enable CloudWatch Monitoring

1.1 Access Your OpenSearch Domain

  1. Log into the AWS Management Console
  2. Navigate to Amazon OpenSearch Service
  3. Select your domain from the list
  4. Click on the Monitoring tab

1.2 Configure CloudWatch Integration

CloudWatch monitoring is enabled by default for Amazon OpenSearch Service domains. To verify and configure:

  1. In the Monitoring tab, you'll see the CloudWatch metrics section
  2. Ensure Enhanced monitoring is enabled (recommended for production)
  3. Set the Monitoring interval to 1 minute for detailed metrics or 5 minutes for standard monitoring
# Using AWS CLI to enable enhanced monitoring
aws es update-elasticsearch-domain-config \
  --domain-name your-domain-name \
  --domain-endpoint-options '{"EnforceHTTPS":true,"TLSSecurityPolicy":"Policy-Min-TLS-1-2-2019-07"}' \
  --ebs-options '{"EBSEnabled":true,"VolumeType":"gp3","VolumeSize":100}' \
  --encryption-at-rest-options '{"Enabled":true}' \
  --node-to-node-encryption-options '{"Enabled":true}' \
  --advanced-options '{"rest.action.multi.allow_explicit_index":"true"}'

Step 2: Key Metrics to Monitor

2.1 Cluster Health Metrics

Monitor these critical cluster-level metrics:

  • ClusterStatus: Overall cluster health (Green, Yellow, Red)
  • Nodes: Number of data nodes in the cluster
  • ClusterIndexWritesBlocked: Whether writes are blocked
  • ClusterIndexWritesBlocked: Whether reads are blocked

2.2 Performance Metrics

Track performance-related metrics:

  • SearchLatency: Average search query latency
  • IndexingLatency: Average indexing latency
  • CPUUtilization: CPU usage across nodes
  • JVMMemoryPressure: JVM heap memory pressure
  • FreeStorageSpace: Available storage space

2.3 Operational Metrics

Monitor operational aspects:

  • AutomatedSnapshotFailure: Failed automated snapshots
  • ClusterUsedSpace: Total used storage space
  • ClusterIndexWritesBlocked: Write operations blocked
  • ClusterReadOnly: Cluster in read-only mode

Step 3: Create CloudWatch Dashboards

3.1 Create a Basic Monitoring Dashboard

  1. Navigate to CloudWatch in the AWS Console
  2. Click Dashboards in the left sidebar
  3. Click Create dashboard
  4. Name your dashboard: OpenSearch-Cluster-Monitoring

3.2 Add Key Widgets

Cluster Health Widget

{
  "metrics": [
    [ "AWS/ES", "ClusterStatus", "DomainName", "your-domain-name", "ClientId", "your-account-id" ]
  ],
  "period": 300,
  "stat": "Average",
  "region": "us-east-1",
  "title": "Cluster Status"
}

Performance Metrics Widget

{
  "metrics": [
    [ "AWS/ES", "SearchLatency", "DomainName", "your-domain-name", "ClientId", "your-account-id" ],
    [ ".", "IndexingLatency", ".", ".", ".", "." ]
  ],
  "period": 300,
  "stat": "Average",
  "region": "us-east-1",
  "title": "Search and Indexing Latency"
}

Resource Utilization Widget

{
  "metrics": [
    [ "AWS/ES", "CPUUtilization", "DomainName", "your-domain-name", "ClientId", "your-account-id" ],
    [ ".", "JVMMemoryPressure", ".", ".", ".", "." ],
    [ ".", "FreeStorageSpace", ".", ".", ".", "." ]
  ],
  "period": 300,
  "stat": "Average",
  "region": "us-east-1",
  "title": "Resource Utilization"
}

Step 4: Set Up CloudWatch Alarms

4.1 Critical Alarms

Cluster Status Alarm

  1. Go to CloudWatchAlarms
  2. Click Create alarm
  3. Select MetricOpenSearch Service
  4. Choose ClusterStatus metric
  5. Configure alarm:
    • Threshold type: Static
    • Condition: Less than 1 (Red status)
    • Evaluation period: 1 out of 1 datapoints
    • Period: 5 minutes

High CPU Utilization Alarm

  1. Create new alarm for CPUUtilization
  2. Configure:
    • Threshold type: Static
    • Condition: Greater than 80%
    • Evaluation period: 2 out of 3 datapoints
    • Period: 5 minutes

Low Storage Space Alarm

  1. Create alarm for FreeStorageSpace
  2. Configure:
    • Threshold type: Static
    • Condition: Less than 10GB
    • Evaluation period: 1 out of 1 datapoints
    • Period: 5 minutes

4.2 Performance Alarms

High Search Latency Alarm

# Using AWS CLI to create search latency alarm
aws cloudwatch put-metric-alarm \
  --alarm-name "OpenSearch-High-Search-Latency" \
  --alarm-description "Alert when search latency exceeds 1000ms" \
  --metric-name "SearchLatency" \
  --namespace "AWS/ES" \
  --statistic "Average" \
  --period 300 \
  --threshold 1000 \
  --comparison-operator "GreaterThanThreshold" \
  --evaluation-periods 2 \
  --alarm-actions "arn:aws:sns:us-east-1:123456789012:your-sns-topic"

High Indexing Latency Alarm

aws cloudwatch put-metric-alarm \
  --alarm-name "OpenSearch-High-Indexing-Latency" \
  --alarm-description "Alert when indexing latency exceeds 500ms" \
  --metric-name "IndexingLatency" \
  --namespace "AWS/ES" \
  --statistic "Average" \
  --period 300 \
  --threshold 500 \
  --comparison-operator "GreaterThanThreshold" \
  --evaluation-periods 2 \
  --alarm-actions "arn:aws:sns:us-east-1:123456789012:your-sns-topic"

Step 5: Configure Notifications

5.1 Set Up SNS Topic

  1. Navigate to SNS in AWS Console
  2. Click Create topic
  3. Choose Standard topic type
  4. Name: OpenSearch-Alerts
  5. Add subscribers (email, SMS, or other endpoints)

5.2 Configure Alarm Actions

For each alarm you create:

  1. In the alarm configuration, add Alarm actions
  2. Select your SNS topic: OpenSearch-Alerts
  3. Optionally add OK actions to notify when issues are resolved

Step 6: Advanced Monitoring Setup

6.1 Custom Metrics

You can publish custom metrics using the CloudWatch API:

import boto3
import time

cloudwatch = boto3.client('cloudwatch')

def publish_custom_metric(metric_name, value, unit='Count'):
    cloudwatch.put_metric_data(
        Namespace='OpenSearch/Custom',
        MetricData=[
            {
                'MetricName': metric_name,
                'Value': value,
                'Unit': unit,
                'Timestamp': time.time()
            }
        ]
    )

6.2 Log Monitoring

Enable CloudWatch Logs for OpenSearch:

  1. In your OpenSearch domain settings, go to Logs
  2. Enable Error logs and Search slow logs
  3. Set retention period (7-30 days recommended)
  4. Create log-based alarms for critical errors

6.3 Anomaly Detection

Set up anomaly detection for key metrics:

  1. In CloudWatch, go to Anomaly Detection
  2. Select your OpenSearch metrics
  3. Configure sensitivity and training period
  4. Create alarms based on anomalies

Step 7: Best Practices

7.1 Alarm Configuration

  • Avoid alert fatigue: Set appropriate thresholds and evaluation periods
  • Use different severity levels: Critical, Warning, and Info
  • Implement escalation: Different actions for different severity levels
  • Test alarms: Regularly test your alarm configurations

7.2 Dashboard Organization

  • Group related metrics: Cluster health, performance, and operational metrics
  • Use appropriate time ranges: 1 hour for real-time, 24 hours for trends
  • Add annotations: Document incidents and maintenance windows
  • Share dashboards: Make dashboards available to relevant teams

7.3 Cost Optimization

  • Monitor CloudWatch costs: Set up billing alarms
  • Use appropriate granularity: 5-minute periods for most metrics
  • Archive old data: Use CloudWatch Insights for historical analysis
  • Optimize custom metrics: Batch metric publishing when possible

Step 8: Troubleshooting Common Issues

8.1 Missing Metrics

If metrics aren't appearing:

  1. Verify enhanced monitoring is enabled
  2. Check IAM permissions
  3. Ensure domain is active and healthy
  4. Wait for metrics to populate (can take 5-15 minutes)

8.2 Alarm Not Triggering

If alarms aren't working:

  1. Check alarm configuration and thresholds
  2. Verify SNS topic and subscriptions
  3. Test alarm actions manually
  4. Review CloudWatch logs for errors

8.3 High False Positives

To reduce false positives:

  1. Increase evaluation periods
  2. Adjust thresholds based on historical data
  3. Use anomaly detection instead of static thresholds
  4. Implement hysteresis (different thresholds for alarm and OK states)

Additional Resources

Frequently Asked Questions

Q: How often should I review and update my monitoring setup?
A: Review your monitoring configuration monthly, and update thresholds and alarms based on changing usage patterns and business requirements. Conduct quarterly reviews of dashboard effectiveness and alarm accuracy.

Q: Can I monitor multiple OpenSearch domains with a single dashboard?
A: Yes, you can create a multi-domain dashboard by adding multiple metrics with different domain names. Use CloudWatch's multi-metric widgets to display all domains in a single view.

Q: What's the difference between basic and enhanced monitoring?
A: Basic monitoring provides metrics at 5-minute intervals, while enhanced monitoring provides metrics at 1-minute intervals and includes additional per-instance metrics. Enhanced monitoring incurs additional costs but provides better granularity.

Q: How do I set up monitoring for OpenSearch Serverless?
A: OpenSearch Serverless uses different metrics and monitoring approaches. Refer to the AWS documentation for serverless-specific monitoring setup, which includes collection-level metrics and different CloudWatch integration patterns.

Q: Can I export CloudWatch metrics to external monitoring tools?
A: Yes, you can use CloudWatch APIs, AWS CLI, or third-party tools to export metrics to external monitoring solutions like Grafana, Prometheus, or Datadog for additional analysis and visualization.

Pulse - Elasticsearch Operations Done Right

Pulse can solve your Elasticsearch issues

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.