Elasticsearch Optimize Search Queries

Optimizing Elasticsearch search queries is essential for maintaining fast response times and efficient resource utilization. This guide covers proven techniques to improve query performance.

Query Optimization Fundamentals

Use Query vs Filter Context

Query context: Calculates relevance scores (slower) Filter context: Boolean match only, cacheable (faster)

// Optimized: Use filter for exact matches
{
  "query": {
    "bool": {
      "must": {
        "match": {"title": "search term"}
      },
      "filter": [
        {"term": {"status": "published"}},
        {"range": {"date": {"gte": "2024-01-01"}}}
      ]
    }
  }
}

Avoid Common Anti-Patterns

Anti-Pattern	Problem	Better Alternative
Leading wildcards (`*term`)	Full index scan	Reverse token filter, ngrams
Deep pagination (`from: 10000`)	Loads all preceding docs	`search_after`
Script scoring	Per-document execution	Pre-compute during indexing
Aggregations on text fields	Fielddata memory overhead	Use `keyword` type

Specific Optimization Techniques

1. Optimize Wildcards

Problem:

// Slow - scans all terms
{"wildcard": {"name": "*smith"}}

Solution: Use edge ngrams or reverse tokens:

PUT /optimized_index
{
  "settings": {
    "analysis": {
      "tokenizer": {
        "edge_ngram_tokenizer": {
          "type": "edge_ngram",
          "min_gram": 2,
          "max_gram": 10
        }
      },
      "analyzer": {
        "edge_ngram_analyzer": {
          "tokenizer": "edge_ngram_tokenizer",
          "filter": ["lowercase"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "fields": {
          "ngram": {
            "type": "text",
            "analyzer": "edge_ngram_analyzer"
          }
        }
      }
    }
  }
}

2. Optimize Pagination

Problem:

// Slow for deep pages
{"from": 10000, "size": 10}

Solution: Use search_after:

// First query
{
  "size": 10,
  "query": {"match_all": {}},
  "sort": [
    {"date": "desc"},
    {"_id": "asc"}
  ]
}

// Subsequent queries
{
  "size": 10,
  "query": {"match_all": {}},
  "sort": [
    {"date": "desc"},
    {"_id": "asc"}
  ],
  "search_after": ["2024-01-15T10:30:00", "abc123"]
}

3. Optimize Aggregations

Reduce bucket count:

{
  "aggs": {
    "categories": {
      "terms": {
        "field": "category",
        "size": 20,  // Only what you need
        "shard_size": 100  // Balance accuracy vs performance
      }
    }
  }
}

Use composite for high-cardinality:

{
  "aggs": {
    "my_buckets": {
      "composite": {
        "size": 100,
        "sources": [
          {"category": {"terms": {"field": "category"}}}
        ]
      }
    }
  }
}

4. Optimize Range Queries

Use date math:

{
  "query": {
    "range": {
      "timestamp": {
        "gte": "now-1d/d",  // Rounded = more cacheable
        "lt": "now/d"
      }
    }
  }
}

5. Limit Return Fields

{
  "_source": ["title", "date", "author"],  // Only needed fields
  "query": {"match": {"title": "elasticsearch"}}
}

Or exclude heavy fields:

{
  "_source": {
    "excludes": ["content", "attachments"]
  },
  "query": {"match": {"title": "elasticsearch"}}
}

6. Use Stored Fields for Specific Fields

PUT /my_index
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "store": true  // Separately retrievable
      }
    }
  }
}

// Retrieve only stored fields
{
  "stored_fields": ["title"],
  "query": {"match_all": {}}
}

Query Patterns to Avoid

Avoid match_all with Large Size

// Bad - loads everything
{
  "size": 10000,
  "query": {"match_all": {}}
}

// Better - use scroll or search_after
POST /my_index/_search?scroll=1m
{
  "size": 1000,
  "query": {"match_all": {}}
}

Avoid Regex When Possible

// Slow
{"regexp": {"path": ".*error.*"}}

// Faster if you can use wildcard
{"wildcard": {"path": "*error*"}}

// Best - restructure data for term queries
{"term": {"contains_error": true}}

Avoid Nested Queries on Large Documents

// Slow if many nested docs
{
  "query": {
    "nested": {
      "path": "comments",
      "query": {"match": {"comments.text": "great"}}
    }
  }
}

// Consider denormalizing for frequently queried data

Index-Level Optimizations

Configure Index for Search

PUT /search-optimized/_settings
{
  "index.refresh_interval": "5s",
  "index.number_of_replicas": 2,  // More replicas = more search capacity
  "index.search.idle.after": "30s",
  "index.queries.cache.enabled": true
}

Pre-warm Queries

For critical queries, use search template warming:

PUT _scripts/my_search_template
{
  "script": {
    "lang": "mustache",
    "source": {
      "query": {
        "match": {
          "": ""
        }
      }
    }
  }
}

Enable Request Cache

{
  "request_cache": true,
  "query": {"match_all": {}}
}

Query Profiling

Use Profile API

{
  "profile": true,
  "query": {
    "match": {"title": "elasticsearch"}
  }
}

Analyze Results

Look for:

High time_in_nanos values
Unexpected query rewrites
Expensive operations (regex, wildcards)

Use Explain API

GET /my-index/_explain/doc_id
{
  "query": {
    "match": {"title": "elasticsearch"}
  }
}

Performance Checklist

Before deploying queries:

Using filter context for non-scoring clauses
No leading wildcards
No deep pagination (use search_after)
Aggregation sizes limited
Only required fields in _source
Timeout configured
Profile API run on complex queries
Tested with production data volume

Monitoring Query Performance

Enable Slow Logs

PUT /my-index/_settings
{
  "index.search.slowlog.threshold.query.warn": "5s",
  "index.search.slowlog.threshold.query.info": "2s"
}

Track Metrics

Search latency (p50, p95, p99)
Search rate
Cache hit ratio
Query rejections