Elasticsearch High Indexing Latency

High indexing latency in Elasticsearch shows up as growing bulk queue depths, increased rejected indexing requests (HTTP 429 / EsRejectedExecutionException), and documents taking longer to become searchable. Before reaching for settings changes, you need to identify which part of the indexing pipeline is actually slow. The fix for merge throttling looks nothing like the fix for an overloaded ingest pipeline, and guessing wastes time.

Diagnosing the Bottleneck

The starting point is the node stats API. Pull indexing metrics from every data node and compare them:

GET /_nodes/stats/indices/indexing

The response contains indexing.index_total, indexing.index_time_in_millis, and indexing.index_current per node. Divide index_time_in_millis by index_total to get average per-document indexing latency. If one node is consistently slower than others, the problem is node-local - disk, CPU, or uneven shard allocation. If all nodes are equally slow, the issue is systemic: settings, mapping, or pipeline overhead.

Check thread pool stats in the same API call or use GET /_cat/thread_pool/write?v&h=node_name,active,queue,rejected. A growing queue means the node cannot keep up. Rejected requests mean it has already fallen behind. Also look at GET /_nodes/stats/indices/merges - if merges.current stays high and merges.total_throttled_time_in_millis is climbing, merge pressure is dragging down indexing throughput.

Refresh Interval and Segment Pressure

Every refresh creates a new Lucene segment. The default index.refresh_interval is 1 second, which means Elasticsearch produces a new segment every second per shard. Each segment consumes heap for its in-memory structures, and the merge scheduler works continuously to consolidate them. Under heavy write load, this cycle generates constant I/O pressure.

For bulk loading or any scenario where near-real-time search is not needed, disable refresh entirely:

PUT /my_index/_settings
{
  "index.refresh_interval": "-1"
}

Set it back to 1s (or 30s for write-heavy indices) after the bulk load completes. Even for steady-state indexing, increasing the refresh interval to 30s or 60s reduces segment creation rate and gives the merge scheduler room to work. Measure the actual search freshness requirement before defaulting to 1s - most applications do not need sub-second document visibility.

Translog Fsync and Durability

By default, Elasticsearch fsyncs the translog after every indexing request (index.translog.durability: request). This is the safe choice but it means every bulk request triggers a disk sync. On spinning disks or cloud volumes with limited IOPS, this becomes a hard bottleneck.

Switching to async translog reduces fsync frequency to the sync_interval (default 5 seconds):

PUT /my_index/_settings
{
  "index.translog.durability": "async",
  "index.translog.sync_interval": "30s"
}

The tradeoff: if a node crashes, you lose up to sync_interval worth of writes. For initial data loads where the source data is still available for replay, this is acceptable. For production indices receiving live traffic, evaluate whether your replication setup (replica shards on other nodes) provides enough durability to tolerate async translog on the primary.

Bulk Batch Sizing and Concurrency

Sending single-document index requests is the slowest path through Elasticsearch. The bulk API amortizes network round-trips, thread pool scheduling, and translog writes across many documents. But bulk batches that are too large cause their own problems: long GC pauses, oversized thread pool task queues, and out-of-memory conditions.

Start with 1,000-5,000 documents per bulk request, aiming for 5-15 MB of payload per batch. Benchmark on your cluster. Increase batch size until throughput plateaus, then back off slightly. Watch heap usage and GC logs - if you see long GC pauses correlating with bulk requests, your batches are too large.

Concurrency matters as much as batch size. A single thread will never saturate a cluster. Use multiple threads or processes, each sending its own bulk requests. The write thread pool defaults to the number of CPU cores per node, so a 3-node cluster with 8 cores each handles 24 concurrent indexing operations before queuing. Match client-side parallelism to that capacity.

Replicas, Ingest Pipelines, and Disk I/O

For large initial data loads, setting replicas to zero removes the overhead of replicating every write to replica shards:

PUT /my_index/_settings
{
  "index.number_of_replicas": 0
}

Restore replicas after the load completes. Elasticsearch will copy the primary shards to the replica nodes, which is faster than replicating individual writes during ingestion.

Ingest pipelines add processing time to every document. Each processor in the pipeline - grok, geoip, enrich, script - runs synchronously in the indexing path. If your pipeline includes a slow enrichment lookup or a regex-heavy grok pattern, it directly inflates indexing latency. Use the GET /_nodes/stats/ingest API to see per-pipeline and per-processor timing. Consider moving heavy transformations to your ingestion client or a dedicated ingest node so data nodes focus on storage and search.

Disk I/O is the final common path for all indexing operations. Elasticsearch writes translog entries, builds Lucene segments, and runs background merges - all hitting the same disk subsystem. On cloud instances, check whether your volume's IOPS or throughput limit has been reached. On bare metal, use iostat -x 1 to watch %util and await on the data path devices. SSDs handle the mixed read/write pattern of concurrent indexing and merging far better than spinning disks. If you are on magnetic storage and hitting merge throttling, SSDs often give more throughput than any settings change.