Elasticsearch Frozen Tier and Searchable Snapshots: How It Works and Common Problems

The frozen tier lets you search data stored in a snapshot repository without keeping a full local copy on disk. Elasticsearch mounts snapshot indices as partially mounted searchable snapshots, caching only the data regions that get accessed. This trades query latency for storage cost - you can retain months or years of data at a fraction of the disk cost of hot or warm tiers, but searches against uncached data require reads from the snapshot repository.

Fully Mounted vs. Partially Mounted Snapshots

Elasticsearch supports two searchable snapshot mount types, and the distinction drives the behavior of the cold and frozen tiers.

Fully mounted snapshots (storage: full_copy) keep a complete local cache of every shard's data. The snapshot repository acts as a recovery source, but all searches are served from local disk. ILM uses this mode in the cold tier. Search performance is comparable to a regular index. The downside is that you need enough local storage to hold the entire index, which limits cost savings.

Partially mounted snapshots (storage: shared_cache) keep only recently accessed data regions in a shared local cache. When a search hits data not in the cache, Elasticsearch fetches it from the snapshot repository - typically downloading 16MB regions at a time to amortize the cost of future reads in the same area. ILM uses this mode in the frozen tier. Local storage requirements drop dramatically, but search latency depends on cache hit rates and repository read speed.

The naming convention reflects the mount type: ILM prefixes fully mounted indices with restored- and partially mounted indices with partial-. If you see an index named partial-.ds-logs-2024.01.15-000042, it is a frozen-tier searchable snapshot.

Mounting Snapshots Manually

Outside of ILM, you can mount a snapshot index directly:

POST /_snapshot/my_repo/my_snapshot/_mount?wait_for_completion=true&storage=shared_cache
{
  "index": "logs-2024.01.15",
  "renamed_index": "frozen-logs-2024.01.15"
}

The storage query parameter controls the mount type. Use shared_cache for partially mounted (frozen tier behavior) and full_copy for fully mounted (cold tier behavior). If you omit the parameter, the default is full_copy.

The renamed_index field is optional but useful to avoid name collisions with the original index. You can also pass index_settings to override specific settings on mount - for example, adjusting the number of replicas.

The mounted index inherits the _tier_preference based on the storage type. A shared_cache mount sets _tier_preference to data_frozen, so you need at least one node with the data_frozen role or the index will remain unallocated.

Shared Cache Configuration

Frozen-tier nodes use a shared cache on local disk for all partially mounted indices. The cache is configured per node in elasticsearch.yml:

xpack.searchable.snapshot.shared_cache.size: 500gb

On dedicated frozen nodes (nodes with only the data_frozen role and no other data roles), this defaults to 90% of total disk space or total disk minus 100GB headroom, whichever is greater. On nodes that mix data_frozen with other data roles, there is no default - you must set it explicitly or the node will reject partially mounted shard allocations.

A related setting controls the headroom:

xpack.searchable.snapshot.shared_cache.size.max_headroom: 100gb

This caps how much space is reserved for non-cache use when the cache size is specified as a percentage. If shared_cache.size is set to an absolute byte value, max_headroom has no effect.

The cache is divided into fixed-size regions. When the cache fills up, the least recently used regions are evicted to make room for new data. There is no pre-warming of the cache on mount - the cache populates only through actual search activity.

Monitoring with Cache Stats APIs

Two APIs give visibility into searchable snapshot performance.

The _searchable_snapshots/stats API returns per-index and per-shard statistics about snapshot repository reads:

GET /frozen-logs-*/_searchable_snapshots/stats

The _searchable_snapshots/cache/stats API reports on the shared cache itself:

GET /_searchable_snapshots/cache/stats

This returns metrics per node, including the number of cache region reads, writes, and evictions. The ratio of reads to writes tells you your effective cache hit rate. If reads roughly equals writes, nearly every search request is fetching data from the repository - the cache is not helping. If reads far exceeds writes, the cache is serving most requests from local disk. High eviction counts relative to writes indicate the cache is too small for the working set.

You can also target specific nodes:

GET /_searchable_snapshots/node123/cache/stats

Common Problems and Errors

Excessive repository reads from undersized cache. The most common frozen-tier performance problem. If your search patterns touch a large fraction of the data and the cache cannot hold the working set, every query triggers repository reads. Repository latency adds directly to query latency - S3 reads typically add 50-200ms per fetch. The fix is either increasing xpack.searchable.snapshot.shared_cache.size or reducing the data volume on frozen nodes by adjusting ILM timing.

Repository unavailable errors. Partially mounted indices depend on the snapshot repository being accessible for every cache miss. If the repository (S3, GCS, Azure Blob) becomes unreachable due to network issues, permission changes, or service outages, searches against uncached data fail with repository access errors. Unlike regular indices, frozen-tier data has no local fallback for uncached regions. Monitor repository connectivity and set up alerts on repository access failures.

Slow searches due to cold cache. After mounting an index or after a node restart, the cache is empty. The first searches against a newly mounted frozen index will be slow because every data access is a cache miss. There is no automatic cache warming - the cache fills only through query-driven access. If you know which time ranges or fields will be queried first, you can run targeted searches after mounting to warm the relevant cache regions manually.

Node rejected shard allocation. If a node's shared_cache.size is set to zero or not configured on a mixed-role node, Elasticsearch will not allocate partially mounted shards to it. The allocation explain API will show the specific reason. Either configure the cache size or add dedicated data_frozen nodes where the cache auto-sizes.

High merge-on-recovery overhead. When frozen-tier shards relocate between nodes, the cache does not transfer. The receiving node starts with a cold cache for those shards. This is expected behavior, but it means shard rebalancing on frozen nodes has a larger performance impact than on other tiers.

The frozen tier works best when a small percentage of the total data is accessed regularly. Log analytics is the canonical use case: recent logs get heavy search traffic, but logs older than 30 days are searched rarely and only for specific investigations. Storing those older logs on the frozen tier with a shared cache sized at 10-20% of total data volume keeps per-GB costs close to raw object storage pricing while maintaining the ability to search when needed. The trade-off breaks down when queries routinely scan large portions of frozen data. Aggregation queries that touch every document across a wide time range will pull most of the data from the repository, negating the cost benefit because you pay for both repository read operations and the latency penalty. For workloads with broad scan patterns, the cold tier with fully mounted snapshots or keeping data on warm nodes may deliver better total cost of ownership once you factor in query execution time and repository egress charges.