Hot-warm-cold architecture allows you to optimize Elasticsearch costs and performance by placing data on appropriate hardware tiers based on access patterns and age.
Understanding Data Tiers
Tier Characteristics
| Tier | Hardware | Data Age | Access Pattern | Purpose |
|---|---|---|---|---|
| Hot | NVMe/SSD | Recent | Frequent read/write | Active indexing and search |
| Warm | SSD/HDD | Days-weeks | Occasional read | Historical search |
| Cold | HDD/Object | Weeks-months | Rare read | Compliance, analytics |
| Frozen | Object storage | Months+ | Very rare | Long-term retention |
Cost Savings
Typical cost reduction: 40-70% compared to all-hot architecture.
Setting Up Node Tiers
Node Configuration
Hot Node:
# elasticsearch.yml
node.roles: [data_hot, data_content]
node.attr.data: hot
Warm Node:
# elasticsearch.yml
node.roles: [data_warm]
node.attr.data: warm
Cold Node:
# elasticsearch.yml
node.roles: [data_cold]
node.attr.data: cold
Frozen Node (8.x):
# elasticsearch.yml
node.roles: [data_frozen]
Hardware Recommendations
| Tier | CPU | RAM | Storage | Example Instance |
|---|---|---|---|---|
| Hot | High | 64-128 GB | NVMe SSD | r5.4xlarge, i3.2xlarge |
| Warm | Medium | 32-64 GB | SSD | r5.2xlarge, d2.xlarge |
| Cold | Low | 16-32 GB | HDD | d2.xlarge, i3en.xlarge |
| Frozen | Minimal | 8-16 GB | Object | Small instance + S3 |
Index Lifecycle Management (ILM)
Create ILM Policy
PUT _ilm/policy/hot_warm_cold_policy
{
"policy": {
"phases": {
"hot": {
"min_age": "0ms",
"actions": {
"rollover": {
"max_primary_shard_size": "50gb",
"max_age": "1d"
},
"set_priority": {
"priority": 100
}
}
},
"warm": {
"min_age": "7d",
"actions": {
"shrink": {
"number_of_shards": 1
},
"forcemerge": {
"max_num_segments": 1
},
"allocate": {
"require": {
"data": "warm"
}
},
"set_priority": {
"priority": 50
}
}
},
"cold": {
"min_age": "30d",
"actions": {
"allocate": {
"require": {
"data": "cold"
}
},
"set_priority": {
"priority": 0
}
}
},
"delete": {
"min_age": "90d",
"actions": {
"delete": {}
}
}
}
}
}
Using Data Tiers (Preferred in 8.x)
PUT _ilm/policy/data_tiers_policy
{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover": {
"max_primary_shard_size": "50gb"
}
}
},
"warm": {
"min_age": "7d",
"actions": {
"shrink": {
"number_of_shards": 1
},
"forcemerge": {
"max_num_segments": 1
}
}
},
"cold": {
"min_age": "30d",
"actions": {}
},
"frozen": {
"min_age": "60d",
"actions": {
"searchable_snapshot": {
"snapshot_repository": "my_repository"
}
}
},
"delete": {
"min_age": "365d",
"actions": {
"delete": {}
}
}
}
}
}
Apply Policy to Index Template
PUT _index_template/logs_template
{
"index_patterns": ["logs-*"],
"data_stream": {},
"template": {
"settings": {
"index.lifecycle.name": "hot_warm_cold_policy",
"index.number_of_shards": 2,
"index.number_of_replicas": 1
}
}
}
Scaling Strategies
Horizontal Scaling (Add Nodes)
When to add hot nodes:
- Indexing latency increasing
- Search latency on recent data degrading
- CPU/memory at capacity
When to add warm/cold nodes:
- Storage filling up
- Historical search queries slow
- Data retention needs increasing
Vertical Scaling (Upgrade Nodes)
When to upgrade:
- Single index performance critical
- Network becoming bottleneck
- Easier than managing more nodes
Scaling Checklist
Before scaling:
□ Current resource utilization measured
□ Growth rate calculated
□ Bottleneck identified (CPU/memory/disk/network)
□ ILM policy optimized
After scaling:
□ Rebalancing complete
□ Performance improved
□ Monitoring updated
□ Cost impact evaluated
Data Movement
Monitor Tier Distribution
GET /_cat/allocation?v&h=node,node.attr.data,disk.used,disk.percent
GET /_cat/indices?v&h=index,store.size,pri.store.size
Check ILM Progress
GET /_ilm/explain/*
GET /_cat/indices?v&h=index,ilm.phase,ilm.action,ilm.step
Force Data Movement
// Move index to warm tier
PUT /my-index/_settings
{
"index.routing.allocation.require.data": "warm"
}
Searchable Snapshots (Frozen Tier)
Setup Repository
PUT _snapshot/my_repository
{
"type": "s3",
"settings": {
"bucket": "my-elasticsearch-snapshots",
"region": "us-east-1"
}
}
Mount Searchable Snapshot
POST /_snapshot/my_repository/snapshot_1/_mount?wait_for_completion=true
{
"index": "my-old-index",
"renamed_index": "my-old-index-searchable"
}
Storage Comparison
| Tier | Storage Cost | Search Performance |
|---|---|---|
| Hot | $$$ | Fastest |
| Warm | $$ | Fast |
| Cold | $ | Moderate |
| Frozen | ¢ | Slower (fetches from object storage) |
Architecture Patterns
Small Deployment (< 500 GB/day)
┌─────────────────────────────────────────┐
│ Hot/Warm Combined Nodes │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Node 1 │ │ Node 2 │ │ Node 3 │ │
│ │ SSD │ │ SSD │ │ SSD │ │
│ └─────────┘ └─────────┘ └─────────┘ │
└─────────────────────────────────────────┘
Medium Deployment (500 GB - 5 TB/day)
┌─────────────────────────────────────────┐
│ Hot Tier (NVMe) │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Hot 1 │ │ Hot 2 │ │ Hot 3 │ │
│ └─────────┘ └─────────┘ └─────────┘ │
└─────────────────────────────────────────┘
┌─────────────────────────────────────────┐
│ Warm Tier (SSD) │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Warm 1 │ │ Warm 2 │ │ Warm 3 │ │
│ └─────────┘ └─────────┘ └─────────┘ │
└─────────────────────────────────────────┘
Large Deployment (> 5 TB/day)
┌───────────────────────────────────────────────────┐
│ Hot Tier (NVMe) │
│ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ │
│ │Hot 1 │ │Hot 2 │ │Hot 3 │ │Hot 4 │ │Hot N │ │
│ └──────┘ └──────┘ └──────┘ └──────┘ └──────┘ │
└───────────────────────────────────────────────────┘
┌───────────────────────────────────────────────────┐
│ Warm Tier (SSD) │
│ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ │
│ │Warm 1│ │Warm 2│ │Warm 3│ │Warm 4│ │Warm N│ │
│ └──────┘ └──────┘ └──────┘ └──────┘ └──────┘ │
└───────────────────────────────────────────────────┘
┌───────────────────────────────────────────────────┐
│ Cold/Frozen Tier │
│ ┌──────┐ ┌──────┐ ┌─────────────────────┐ │
│ │Cold 1│ │Cold 2│ │ Object Storage (S3) │ │
│ └──────┘ └──────┘ └─────────────────────┘ │
└───────────────────────────────────────────────────┘
Monitoring Tiered Clusters
Key Metrics
- Data distribution across tiers
- ILM phase transitions
- Tier-specific latencies
- Storage utilization per tier
Dashboard Queries
// Data per tier
GET /_cat/allocation?v&h=node,node.attr.data,disk.indices
// ILM status
GET _ilm/explain/*?only_errors=true