Why Swap Kills Elasticsearch Performance and How to Disable It

Elasticsearch nodes rely on the JVM heap for nearly every operation - caching field data, buffering indexing requests, managing segment metadata, running queries. When the operating system swaps any portion of that heap to disk, operations that normally complete in microseconds suddenly take milliseconds or worse. The performance cliff is not gradual. A single swap event during a garbage collection cycle can cascade into query timeouts, indexing rejections, and cluster instability.

Why the JVM and Swap Interact So Badly

The JVM garbage collector must periodically scan the entire heap to identify unreachable objects and reclaim memory. During a major GC cycle, the collector touches memory regions across the full heap space. If portions of the heap have been swapped to disk, each page fault forces the kernel to read that page back from swap before the GC thread can continue. A single page read from a spinning disk takes roughly 5-10ms. A modern Elasticsearch node with a 30GB heap contains millions of pages. Even a few hundred swapped pages can turn a 200ms GC pause into a multi-second stall.

GC pauses are stop-the-world events. While GC threads wait on disk I/O to read swapped pages back into RAM, all application threads are paused. Search requests queue up. Indexing buffers cannot flush. The node appears unresponsive to the master, which may remove it from the cluster if the pause exceeds the fault detection timeout.

Swap also creates a feedback loop. When the OS pages out JVM heap to make room for filesystem cache, the next GC cycle must fault those pages back in. This triggers more swapping, making each subsequent GC cycle slower. Teams have reported search latencies jumping from single-digit milliseconds to 5-10 seconds when swap activity occurs on Elasticsearch nodes.

The bootstrap.memory_lock Setting

Elasticsearch provides the bootstrap.memory_lock setting to prevent heap memory from being swapped. When set to true, Elasticsearch calls mlockall() on startup, which tells the kernel to keep the entire JVM process address space locked in physical RAM. No portion of the heap can be paged out to swap.

To enable it, add this to elasticsearch.yml:

bootstrap.memory_lock: true

This is the Elastic-recommended approach for production deployments. It protects against swap regardless of the OS-level swap configuration. However, mlockall() only locks the JVM process memory. Off-heap allocations used by Lucene for direct byte buffers and memory-mapped files operate outside this lock.

Three Approaches to Disabling Swap

There are three methods to protect Elasticsearch from swap, each with different trade-offs.

Disable swap entirely. The most straightforward method. Run swapoff -a to disable all swap devices immediately, then remove or comment out swap entries in /etc/fstab to persist across reboots. This protects all processes on the system, not just Elasticsearch. On dedicated Elasticsearch nodes, this is the cleanest option. The risk is that if memory is truly exhausted, the OOM killer will terminate processes instead of swapping gracefully.

Set vm.swappiness to 1. This tells the kernel to avoid swapping unless under extreme memory pressure. Set it with sysctl vm.swappiness=1 and persist in /etc/sysctl.conf. A value of 1 (not 0) is recommended because setting it to 0 can cause the OOM killer to activate in some kernel versions before attempting to reclaim any swap-backed pages. This approach still leaves swap available as a safety net but makes the kernel strongly prefer reclaiming filesystem cache instead.

Use memory_lock. This is the bootstrap.memory_lock: true approach described above. It only protects the Elasticsearch process, leaving swap available for other processes on shared machines. This is the right choice when Elasticsearch shares a host with other services that benefit from swap.

Verifying Memory Lock and Troubleshooting Failures

After starting Elasticsearch with bootstrap.memory_lock: true, verify it actually took effect:

GET _nodes?filter_path=**.mlockall

The response should show "mlockall": true for each node. If it shows false, the lock failed silently - Elasticsearch logs a warning but continues running without it.

The most common failure is insufficient system-level limits. For systemd-managed installations, create /etc/systemd/system/elasticsearch.service.d/override.conf:

[Service]
LimitMEMLOCK=infinity

Then run systemctl daemon-reload. For non-systemd systems, set the memlock ulimit in /etc/security/limits.conf:

elasticsearch soft memlock unlimited
elasticsearch hard memlock unlimited

Another failure mode: the JNA temporary directory mounted with noexec. The mlockall call uses JNA, and if /tmp has noexec, it fails. Fix this by setting ES_JAVA_OPTS="-Djna.tmpdir=/path/to/exec-mounted/dir". In Docker, add --ulimit memlock=-1:-1 to the run command or set memlock: -1 in Docker Compose.

Performance Impact When Swap Is Active

The cost of swap activity on Elasticsearch is measurable and severe. GC pause times that normally sit under 200ms with G1GC can spike to 5-15 seconds when the collector must fault in swapped pages. These pauses directly translate to query latency because all application threads are frozen.

Thread pool rejections increase as queued requests back up behind paused threads. The search thread pool can fill within seconds during a long GC stall. Bulk indexing requests time out and must be retried, adding more load to an already struggling node.

Cluster stability suffers too. The default fault detection timeout is 30 seconds. A sustained GC pause exceeding that causes the master to consider the node failed and triggers shard reallocation - an expensive operation that strains the remaining nodes. Even shorter pauses cause the node to fall behind on cluster state updates, leading to stale routing tables and increased retries. The fix is straightforward: disable swap or lock memory before performance problems appear, not after.