Meet the Pulse team at AWS re:Invent!

Read more

Elasticsearch Hot Threads Analysis

The Elasticsearch hot threads API provides insights into which threads are consuming the most CPU time. This is essential for diagnosing performance issues, identifying runaway queries, and understanding cluster resource usage.

Understanding Hot Threads

What Are Hot Threads?

Hot threads are the threads currently consuming the most CPU cycles. Analyzing them helps identify:

  • Expensive queries
  • Indexing bottlenecks
  • Garbage collection issues
  • Segment merge operations
  • Network or transport problems

Using the Hot Threads API

Basic Request

GET /_nodes/hot_threads

With Parameters

GET /_nodes/hot_threads?threads=10&interval=500ms&type=cpu

Parameters Explained

Parameter Default Description
threads 3 Number of hot threads to return
interval 500ms Sampling interval
type cpu Type: cpu, wait, or block
snapshots 10 Number of samples to take
ignore_idle_threads true Skip idle threads

Per-Node Request

Analyze specific nodes:

GET /_nodes/node_name/hot_threads?threads=10

Interpreting Hot Threads Output

Sample Output

::: {node_name}{node_id}{host}{ip}
   Hot threads at 2024-01-15T10:30:00.000Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

   33.3% (166.4ms out of 500ms) cpu usage by thread 'elasticsearch[node_name][search][T#1]'
     5/10 snapshots sharing following 15 elements
       java.base@11.0.11/java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3963)
       java.base@11.0.11/java.util.regex.Pattern$Branch.match(Pattern.java:4766)
       org.apache.lucene.util.automaton.RegExp.parseUnionExp(RegExp.java:509)
       ...

   22.1% (110.5ms out of 500ms) cpu usage by thread 'elasticsearch[node_name][write][T#2]'
     7/10 snapshots sharing following 12 elements
       org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(...)
       ...

Understanding the Output

  1. Thread name: Shows the thread pool (search, write, generic, etc.)
  2. CPU percentage: Portion of interval spent executing
  3. Stack trace: What the thread is doing
  4. Snapshots: How many samples showed this stack

Common Hot Thread Patterns

Pattern 1: Search Threads Busy

cpu usage by thread 'elasticsearch[...][search][T#...]'
  org.apache.lucene.search.*

Meaning: Active search queries consuming CPU

Investigation:

  • Check slow query logs
  • Review active tasks: GET /_tasks?actions=*search*
  • Look for expensive query patterns (regex, wildcards)

Solutions:

  • Optimize queries
  • Add query timeouts
  • Scale search capacity

Pattern 2: Regex or Wildcard Queries

cpu usage by thread 'elasticsearch[...][search][T#...]'
  java.util.regex.Pattern*
  org.apache.lucene.util.automaton.RegExp*

Meaning: Expensive regex or wildcard query running

Solutions:

  • Avoid leading wildcards
  • Use simpler patterns
  • Consider ngram indexing

Pattern 3: Index Writer / Merging

cpu usage by thread 'elasticsearch[...][generic][T#...]'
  org.apache.lucene.index.IndexWriter*
  org.apache.lucene.codecs*

Meaning: Segment merging or flushing

Investigation:

GET /_cat/segments?v
GET /_nodes/stats/indices/merges

Solutions:

  • Increase refresh interval
  • Tune merge policy
  • Schedule force merge during off-peak

Pattern 4: Garbage Collection

cpu usage by thread 'GC Thread#...'
  <no stack trace or JVM native code>

Meaning: JVM garbage collection consuming CPU

Investigation:

GET /_nodes/stats/jvm?filter_path=nodes.*.jvm.gc

Solutions:

  • Reduce heap pressure
  • Tune GC settings
  • Add more nodes

Pattern 5: Transport/Network Operations

cpu usage by thread 'elasticsearch[...][transport_worker][T#...]'
  org.elasticsearch.transport*
  io.netty*

Meaning: Network communication overhead

Investigation:

  • Check network latency between nodes
  • Look for oversized requests

Solutions:

  • Reduce bulk request sizes
  • Ensure low-latency network
  • Use dedicated transport network

Pattern 6: Aggregations

cpu usage by thread 'elasticsearch[...][search][T#...]'
  org.elasticsearch.search.aggregations*
  org.apache.lucene.search.FieldComparator*

Meaning: Aggregation computation

Solutions:

  • Reduce aggregation bucket sizes
  • Use composite for high-cardinality
  • Add doc_values for aggregated fields

Hot Threads Analysis Workflow

Step 1: Capture Hot Threads During Issue

# Capture every 30 seconds during issue
for i in {1..10}; do
  curl -s "localhost:9200/_nodes/hot_threads?threads=10" >> hot_threads_$(date +%s).txt
  sleep 30
done

Step 2: Identify Pattern

Look for:

  • Same thread pool consistently busy
  • Same stack traces repeating
  • Correlation with specific operations

Step 3: Correlate with Other Metrics

GET /_nodes/stats/thread_pool
GET /_tasks?detailed=true
GET /_cat/nodes?v&h=name,cpu,load_1m

Step 4: Take Action

Based on the pattern:

  • Search threads: Optimize queries, add capacity
  • Write threads: Tune indexing, increase workers
  • Generic threads: Check merge policy, reduce segments
  • GC threads: Address memory pressure

Comparing Thread Types

CPU Type (Default)

GET /_nodes/hot_threads?type=cpu

Shows threads actively using CPU.

Wait Type

GET /_nodes/hot_threads?type=wait

Shows threads waiting (e.g., for I/O, locks).

Block Type

GET /_nodes/hot_threads?type=block

Shows blocked threads (waiting for monitors/locks).

Best Practices

Regular Monitoring

  • Capture hot threads during normal operation for baseline
  • Set up automated capture during performance alerts
  • Keep historical data for comparison

Analysis Tips

  • Compare hot threads across nodes
  • Look at consistency of patterns
  • Correlate with application events
  • Focus on highest percentage threads first

Documentation

Keep notes on observed patterns:

  • What workload causes this pattern?
  • What was the resolution?
  • Are there recurring patterns?
Pulse - Elasticsearch Operations Done Right

Pulse can solve your Elasticsearch issues

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.