Elasticsearch CPU Throttling in Docker and Kubernetes Containers

Running Elasticsearch inside containers introduces a performance problem that does not exist on bare metal: CPU throttling by the Linux Completely Fair Scheduler (CFS) bandwidth control. A containerized node can have plenty of CPU capacity on average yet experience severe latency spikes because CFS throttles bursts of CPU usage within short time windows.

How CFS Quota and Period Affect the JVM

The Linux CFS bandwidth control uses two parameters to limit container CPU usage: cpu.cfs_quota_us and cpu.cfs_period_us. The period is typically 100ms (100,000 microseconds). The quota defines how many microseconds of CPU time the container can use within each period. A container with a 2-core CPU limit gets a quota of 200,000us per 100ms period.

The problem is that quota is consumed across all threads collectively. Elasticsearch runs dozens of threads concurrently - search threads, indexing threads, GC threads, Lucene merge threads, network I/O threads. When a burst of activity causes many threads to run simultaneously, the container can exhaust its quota early in the period. The kernel then throttles all threads in the cgroup for the remainder of that period, even if average CPU usage is well below the limit.

Consider an Elasticsearch node with a 4-core limit running a G1GC cycle. The GC spawns 4 parallel collector threads that pin all cores for 50ms, consuming 200,000us of quota - half the period's budget in one GC cycle. The remaining search, indexing, and merge threads must share what is left. If they need a burst, they get throttled.

Symptoms of CPU Throttling

CPU throttling in Elasticsearch manifests as intermittent performance degradation rather than consistent slowness. Symptoms include:

Search latency spikes that appear in P99 metrics but not in P50. The median query runs fine because throttling only kicks in during burst periods. Slow log entries appear sporadically without a clear pattern tied to query complexity or data volume.

Thread pool rejections increase, particularly for the search and write thread pools. Threads that are throttled cannot complete work fast enough, causing the queue to fill. The _cat/thread_pool API shows growing rejected counts even though CPU utilization as reported by container metrics looks moderate.

GC pause times increase unpredictably. G1GC targets a 200ms pause time, but if GC threads are throttled mid-cycle, the actual pause stretches to 500ms or longer. The GC log shows pause times well above the target with no corresponding increase in heap pressure.

Why Elasticsearch Is Particularly Sensitive

Several characteristics make Elasticsearch more vulnerable to CFS throttling than typical web applications.

Garbage collection is the primary trigger. The JVM GC runs parallel collector threads equal to the number of available processors. These threads run at full speed during stop-the-world pauses, creating exactly the kind of CPU burst that triggers throttling. Unlike an application server where request handling is naturally spread over time, GC concentrates CPU demand into brief, intense periods.

Lucene segment merges are another source of CPU bursts. When Elasticsearch merges index segments, the merge threads perform CPU-intensive compression and sorting work. The default ConcurrentMergeScheduler runs multiple merge threads simultaneously, compounding the burst effect.

Periodic housekeeping - cluster state processing, shard health checks, cache eviction - adds to the burst patterns. These tasks are not individually expensive but their concurrent execution contributes to quota exhaustion.

Monitoring Throttling via cgroup Metrics

You can directly measure throttling inside the container by reading cgroup statistics. For cgroups v1:

cat /sys/fs/cgroup/cpu/cpu.stat

This returns nr_periods, nr_throttled, and throttled_time. The throttled ratio (nr_throttled / nr_periods) indicates what percentage of scheduling periods experienced throttling. Any value above 10-20% warrants investigation. The throttled_time value shows total nanoseconds of throttling - divide by elapsed wall time to get the effective CPU time lost.

For cgroups v2, the equivalent file is /sys/fs/cgroup/cpu.stat with fields nr_throttled and throttled_usec.

Within Elasticsearch, the _nodes/stats/os API reports cgroup metrics including cfs_quota_micros, cfs_period_micros, and the number of throttled periods. Monitoring these through your observability stack provides visibility into throttling without shell access to the container.

Sizing Guidance and Configuration

The most effective mitigation is to avoid fractional CPU limits. Set CPU limits to whole numbers (2, 4, 8) rather than values like 2.5 or 3.7. Fractional limits create situations where the JVM detects fewer available processors than the actual quota allows, leading to suboptimal thread pool sizing.

Set CPU requests equal to CPU limits (Guaranteed QoS class in Kubernetes). When requests and limits differ (Burstable QoS), the node competes with other pods for CPU during contention. Elasticsearch's latency-sensitive workload performs poorly with unpredictable CPU availability.

If throttling persists despite adequate average CPU, consider increasing the CPU limit or reducing the CFS period. The kubelet's cpuCFSQuotaPeriod parameter (default 100ms) can be reduced to 10ms for finer-grained scheduling that better accommodates bursty workloads, though this affects all pods on the node.

The JVM flag -XX:+UseContainerSupport is enabled by default since JDK 8u191 and tells the JVM to read cgroup limits for processor and memory detection. Elasticsearch uses this detected processor count to set node.processors, which sizes thread pools. On Kubernetes, the cpu request value directly influences the detected processor count. A request of 2 cores means the search thread pool gets 2 * 1.5 + 1 = 4 threads.

You can override automatic detection with -XX:ActiveProcessorCount=N in ES_JAVA_OPTS or by setting node.processors in elasticsearch.yml. This is useful when using cpu shares instead of quota, where the JVM may detect the full host processor count and create oversized thread pools.