Elasticsearch Hot Threads Analysis

The Elasticsearch hot threads API provides insights into which threads are consuming the most CPU time. This is essential for diagnosing performance issues, identifying runaway queries, and understanding cluster resource usage.

Understanding Hot Threads

What Are Hot Threads?

Hot threads are the threads currently consuming the most CPU cycles. Analyzing them helps identify:

Expensive queries
Indexing bottlenecks
Garbage collection issues
Segment merge operations
Network or transport problems

Using the Hot Threads API

Basic Request

GET /_nodes/hot_threads

With Parameters

GET /_nodes/hot_threads?threads=10&interval=500ms&type=cpu

Parameters Explained

Parameter	Default	Description
`threads`	3	Number of hot threads to return
`interval`	500ms	Sampling interval
`type`	`cpu`	Type: `cpu`, `wait`, or `block`
`snapshots`	10	Number of samples to take
`ignore_idle_threads`	true	Skip idle threads

Per-Node Request

Analyze specific nodes:

GET /_nodes/node_name/hot_threads?threads=10

Interpreting Hot Threads Output

Sample Output

::: {node_name}{node_id}{host}{ip}
   Hot threads at 2024-01-15T10:30:00.000Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

   33.3% (166.4ms out of 500ms) cpu usage by thread 'elasticsearch[node_name][search][T#1]'
     5/10 snapshots sharing following 15 elements
       java.base@11.0.11/java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3963)
       java.base@11.0.11/java.util.regex.Pattern$Branch.match(Pattern.java:4766)
       org.apache.lucene.util.automaton.RegExp.parseUnionExp(RegExp.java:509)
       ...

   22.1% (110.5ms out of 500ms) cpu usage by thread 'elasticsearch[node_name][write][T#2]'
     7/10 snapshots sharing following 12 elements
       org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(...)
       ...

Understanding the Output

Thread name: Shows the thread pool (search, write, generic, etc.)
CPU percentage: Portion of interval spent executing
Stack trace: What the thread is doing
Snapshots: How many samples showed this stack

Common Hot Thread Patterns

Pattern 1: Search Threads Busy

cpu usage by thread 'elasticsearch[...][search][T#...]'
  org.apache.lucene.search.*

Meaning: Active search queries consuming CPU

Investigation:

Check slow query logs
Review active tasks: GET /_tasks?actions=*search*
Look for expensive query patterns (regex, wildcards)

Solutions:

Optimize queries
Add query timeouts
Scale search capacity

Pattern 2: Regex or Wildcard Queries

cpu usage by thread 'elasticsearch[...][search][T#...]'
  java.util.regex.Pattern*
  org.apache.lucene.util.automaton.RegExp*

Meaning: Expensive regex or wildcard query running

Solutions:

Avoid leading wildcards
Use simpler patterns
Consider ngram indexing

Pattern 3: Index Writer / Merging

cpu usage by thread 'elasticsearch[...][generic][T#...]'
  org.apache.lucene.index.IndexWriter*
  org.apache.lucene.codecs*

Meaning: Segment merging or flushing

Investigation:

GET /_cat/segments?v
GET /_nodes/stats/indices/merges

Solutions:

Increase refresh interval
Tune merge policy
Schedule force merge during off-peak

Pattern 4: Garbage Collection

cpu usage by thread 'GC Thread#...'
  <no stack trace or JVM native code>

Meaning: JVM garbage collection consuming CPU

Investigation:

GET /_nodes/stats/jvm?filter_path=nodes.*.jvm.gc

Solutions:

Reduce heap pressure
Tune GC settings
Add more nodes

Pattern 5: Transport/Network Operations

cpu usage by thread 'elasticsearch[...][transport_worker][T#...]'
  org.elasticsearch.transport*
  io.netty*

Meaning: Network communication overhead

Investigation:

Check network latency between nodes
Look for oversized requests

Solutions:

Reduce bulk request sizes
Ensure low-latency network
Use dedicated transport network

Pattern 6: Aggregations

cpu usage by thread 'elasticsearch[...][search][T#...]'
  org.elasticsearch.search.aggregations*
  org.apache.lucene.search.FieldComparator*

Meaning: Aggregation computation

Solutions:

Reduce aggregation bucket sizes
Use composite for high-cardinality
Add doc_values for aggregated fields

Hot Threads Analysis Workflow

Step 1: Capture Hot Threads During Issue

# Capture every 30 seconds during issue
for i in {1..10}; do
  curl -s "localhost:9200/_nodes/hot_threads?threads=10" >> hot_threads_$(date +%s).txt
  sleep 30
done

Step 2: Identify Pattern

Look for:

Same thread pool consistently busy
Same stack traces repeating
Correlation with specific operations

Step 3: Correlate with Other Metrics

GET /_nodes/stats/thread_pool
GET /_tasks?detailed=true
GET /_cat/nodes?v&h=name,cpu,load_1m

Step 4: Take Action

Based on the pattern:

Search threads: Optimize queries, add capacity
Write threads: Tune indexing, increase workers
Generic threads: Check merge policy, reduce segments
GC threads: Address memory pressure

Comparing Thread Types

CPU Type (Default)

GET /_nodes/hot_threads?type=cpu

Shows threads actively using CPU.

Wait Type

GET /_nodes/hot_threads?type=wait

Shows threads waiting (e.g., for I/O, locks).

Block Type

GET /_nodes/hot_threads?type=block

Shows blocked threads (waiting for monitors/locks).

Best Practices

Regular Monitoring

Capture hot threads during normal operation for baseline
Set up automated capture during performance alerts
Keep historical data for comparison

Analysis Tips

Compare hot threads across nodes
Look at consistency of patterns
Correlate with application events
Focus on highest percentage threads first

Documentation

Keep notes on observed patterns:

What workload causes this pattern?
What was the resolution?
Are there recurring patterns?

Elasticsearch Hot Threads Analysis

Understanding Hot Threads

What Are Hot Threads?

Using the Hot Threads API

Basic Request

With Parameters

Parameters Explained

Per-Node Request

Interpreting Hot Threads Output

Sample Output

Understanding the Output

Common Hot Thread Patterns

Pattern 1: Search Threads Busy

Pattern 2: Regex or Wildcard Queries

Pattern 3: Index Writer / Merging

Pattern 4: Garbage Collection

Pattern 5: Transport/Network Operations

Pattern 6: Aggregations

Hot Threads Analysis Workflow

Step 1: Capture Hot Threads During Issue

Step 2: Identify Pattern

Step 3: Correlate with Other Metrics

Step 4: Take Action

Comparing Thread Types

CPU Type (Default)

Wait Type

Block Type

Best Practices

Regular Monitoring

Analysis Tips

Documentation

Related Topics