Every Elasticsearch shard is a Lucene index, and every Lucene index is composed of immutable segments. A search query must visit each segment independently - open its term dictionary, walk its postings lists, load its doc values columns, and merge per-segment results into a single response. More segments means more work per query, more file descriptors held open, and more heap consumed by in-memory structures. When segment counts grow beyond what the merge policy can manage, search latency climbs steadily.
How Segments Accumulate
Elasticsearch creates new segments through two operations: refresh and flush. A refresh (which happens every index.refresh_interval, default 1 second) takes the in-memory indexing buffer and writes it as a new segment that becomes searchable. A flush forces a Lucene commit, writing a durable segment to disk and clearing the translog. Under steady write load, a shard can generate a new segment every second.
Deleted or updated documents compound the problem. Lucene segments are immutable, so a delete marks the document in a bitset but does not reclaim space. An update is internally a delete-then-insert. Stale documents linger until a merge combines segments and physically drops deleted entries. Until that merge happens, queries still open the segment, skip the deleted docs, and pay the I/O cost.
Monitoring Segment Counts
The _cat/segments API shows segment-level detail per shard:
GET /_cat/segments/my_index?v&h=shard,segment,generation,docs.count,docs.deleted,size
For a broader view across all indices, use the index stats API:
GET /_stats/segments?filter_path=indices.*.primaries.segments.count
A healthy shard typically has 10-50 segments. If you see hundreds per shard - especially many small segments alongside a few large ones - the merge policy is falling behind or has been misconfigured.
Track docs.deleted as a fraction of docs.count. A shard where 30-40% of documents are marked deleted is carrying dead weight that still consumes disk and slows iteration. A merge would reclaim that space, but the merge policy may not select those segments if they exceed max_merged_segment / 2.
The Tiered Merge Policy
Elasticsearch uses Lucene's TieredMergePolicy by default. It groups segments into tiers by size and merges the smallest within each tier, subject to two constraints:
index.merge.policy.segments_per_tier(default: 10) - target segment count per tier. Lower values trigger more aggressive merging. Must be >=max_merge_at_once.index.merge.policy.max_merged_segment(default: 5gb) - maximum size of a merged segment. The policy will not merge segments if the result would exceed this limit.
The max_merged_segment cap is the most common reason for runaway segment counts on large shards. A shard with 200 GB of data and the default 5 GB cap ends up with 40+ segments that will never be merged further. Segments larger than max_merged_segment / 2 (2.5 GB by default) are excluded from merge candidates entirely.
For large shards, raise max_merged_segment:
PUT /my_index/_settings
{
"index.merge.policy.max_merged_segment": "10gb"
}
Individual merges will take longer, but steady-state segment count drops. The right value depends on your shard sizes and disk throughput. Test in staging before changing production settings.
Force Merge: When and How
Force merge compacts all segments in a shard into a target number (typically 1). It is a heavy I/O operation - the node rewrites the entire shard's data into new segments, temporarily consuming roughly double the disk space.
POST /my_index/_forcemerge?max_num_segments=1
Force merge is safe on read-only indices - time-based indices that have rolled over, completed batch loads, or any index with index.blocks.write: true. On these indices, a single segment gives the best search performance: one term dictionary, one postings list, one doc values column per field. Lucene can also use simpler internal data structures for single-segment indices.
Do not force merge indices that are still receiving writes. New documents will create fresh segments on top of the force-merged one, and the merge policy will struggle to combine a small segment with the massive force-merged segment. You also risk running the node out of disk during the merge. ILM can automate force merge as part of the warm or cold phase transition, tying it to the rollover that makes the index read-only.
Reducing Segment Creation Rate
If the root cause is too many segments being created rather than too few being merged, address the creation side. Increase index.refresh_interval from the default 1s to 30s or 60s on write-heavy indices where near-real-time search is not required. During bulk loads, set it to -1 to disable refresh entirely and trigger a manual refresh after the load completes.
Batch your writes. Each bulk request that lands between refreshes gets folded into a single segment at refresh time. Small, frequent single-document writes produce many tiny segments the merge policy must clean up. Consolidate writes into bulk requests of 1,000-5,000 documents so multiple requests land within the same refresh interval.
Monitor the relationship between segment creation and merge rate using GET /_nodes/stats/indices/merges. If merges.current is consistently near index.merge.scheduler.max_thread_count (default: the lesser of 3 or half the CPU core count), the merge scheduler is at capacity. Reducing segment creation via refresh interval tuning is more effective at that point than trying to speed up merges.