Elasticsearch Scaling Nodes with Hot-Warm Architecture

Hot-warm-cold architecture allows you to optimize Elasticsearch costs and performance by placing data on appropriate hardware tiers based on access patterns and age.

Understanding Data Tiers

Tier Characteristics

Tier	Hardware	Data Age	Access Pattern	Purpose
Hot	NVMe/SSD	Recent	Frequent read/write	Active indexing and search
Warm	SSD/HDD	Days-weeks	Occasional read	Historical search
Cold	HDD/Object	Weeks-months	Rare read	Compliance, analytics
Frozen	Object storage	Months+	Very rare	Long-term retention

Cost Savings

Typical cost reduction: 40-70% compared to all-hot architecture.

Setting Up Node Tiers

Node Configuration

Hot Node:

# elasticsearch.yml
node.roles: [data_hot, data_content]
node.attr.data: hot

Warm Node:

# elasticsearch.yml
node.roles: [data_warm]
node.attr.data: warm

Cold Node:

# elasticsearch.yml
node.roles: [data_cold]
node.attr.data: cold

Frozen Node (8.x):

# elasticsearch.yml
node.roles: [data_frozen]

Hardware Recommendations

Tier	CPU	RAM	Storage	Example Instance
Hot	High	64-128 GB	NVMe SSD	r5.4xlarge, i3.2xlarge
Warm	Medium	32-64 GB	SSD	r5.2xlarge, d2.xlarge
Cold	Low	16-32 GB	HDD	d2.xlarge, i3en.xlarge
Frozen	Minimal	8-16 GB	Object	Small instance + S3

Index Lifecycle Management (ILM)

Create ILM Policy

PUT _ilm/policy/hot_warm_cold_policy
{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "rollover": {
            "max_primary_shard_size": "50gb",
            "max_age": "1d"
          },
          "set_priority": {
            "priority": 100
          }
        }
      },
      "warm": {
        "min_age": "7d",
        "actions": {
          "shrink": {
            "number_of_shards": 1
          },
          "forcemerge": {
            "max_num_segments": 1
          },
          "allocate": {
            "require": {
              "data": "warm"
            }
          },
          "set_priority": {
            "priority": 50
          }
        }
      },
      "cold": {
        "min_age": "30d",
        "actions": {
          "allocate": {
            "require": {
              "data": "cold"
            }
          },
          "set_priority": {
            "priority": 0
          }
        }
      },
      "delete": {
        "min_age": "90d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

Using Data Tiers (Preferred in 8.x)

PUT _ilm/policy/data_tiers_policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_primary_shard_size": "50gb"
          }
        }
      },
      "warm": {
        "min_age": "7d",
        "actions": {
          "shrink": {
            "number_of_shards": 1
          },
          "forcemerge": {
            "max_num_segments": 1
          }
        }
      },
      "cold": {
        "min_age": "30d",
        "actions": {}
      },
      "frozen": {
        "min_age": "60d",
        "actions": {
          "searchable_snapshot": {
            "snapshot_repository": "my_repository"
          }
        }
      },
      "delete": {
        "min_age": "365d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

Apply Policy to Index Template

PUT _index_template/logs_template
{
  "index_patterns": ["logs-*"],
  "data_stream": {},
  "template": {
    "settings": {
      "index.lifecycle.name": "hot_warm_cold_policy",
      "index.number_of_shards": 2,
      "index.number_of_replicas": 1
    }
  }
}

Scaling Strategies

Horizontal Scaling (Add Nodes)

When to add hot nodes:

Indexing latency increasing
Search latency on recent data degrading
CPU/memory at capacity

When to add warm/cold nodes:

Storage filling up
Historical search queries slow
Data retention needs increasing

Vertical Scaling (Upgrade Nodes)

When to upgrade:

Single index performance critical
Network becoming bottleneck
Easier than managing more nodes

Scaling Checklist

Before scaling:
□ Current resource utilization measured
□ Growth rate calculated
□ Bottleneck identified (CPU/memory/disk/network)
□ ILM policy optimized

After scaling:
□ Rebalancing complete
□ Performance improved
□ Monitoring updated
□ Cost impact evaluated

Data Movement

Monitor Tier Distribution

GET /_cat/allocation?v&h=node,node.attr.data,disk.used,disk.percent
GET /_cat/indices?v&h=index,store.size,pri.store.size

Check ILM Progress

GET /_ilm/explain/*
GET /_cat/indices?v&h=index,ilm.phase,ilm.action,ilm.step

Force Data Movement

// Move index to warm tier
PUT /my-index/_settings
{
  "index.routing.allocation.require.data": "warm"
}

Searchable Snapshots (Frozen Tier)

Setup Repository

PUT _snapshot/my_repository
{
  "type": "s3",
  "settings": {
    "bucket": "my-elasticsearch-snapshots",
    "region": "us-east-1"
  }
}

Mount Searchable Snapshot

POST /_snapshot/my_repository/snapshot_1/_mount?wait_for_completion=true
{
  "index": "my-old-index",
  "renamed_index": "my-old-index-searchable"
}

Storage Comparison

Tier	Storage Cost	Search Performance
Hot	$$$	Fastest
Warm	$$	Fast
Cold	$	Moderate
Frozen	¢	Slower (fetches from object storage)

Architecture Patterns

Small Deployment (< 500 GB/day)

┌─────────────────────────────────────────┐
│         Hot/Warm Combined Nodes         │
│  ┌─────────┐  ┌─────────┐  ┌─────────┐  │
│  │  Node 1 │  │  Node 2 │  │  Node 3 │  │
│  │  SSD    │  │  SSD    │  │  SSD    │  │
│  └─────────┘  └─────────┘  └─────────┘  │
└─────────────────────────────────────────┘

Medium Deployment (500 GB - 5 TB/day)

┌─────────────────────────────────────────┐
│              Hot Tier (NVMe)            │
│  ┌─────────┐  ┌─────────┐  ┌─────────┐  │
│  │  Hot 1  │  │  Hot 2  │  │  Hot 3  │  │
│  └─────────┘  └─────────┘  └─────────┘  │
└─────────────────────────────────────────┘
┌─────────────────────────────────────────┐
│              Warm Tier (SSD)            │
│  ┌─────────┐  ┌─────────┐  ┌─────────┐  │
│  │ Warm 1  │  │ Warm 2  │  │ Warm 3  │  │
│  └─────────┘  └─────────┘  └─────────┘  │
└─────────────────────────────────────────┘

Large Deployment (> 5 TB/day)

┌───────────────────────────────────────────────────┐
│                   Hot Tier (NVMe)                 │
│  ┌──────┐  ┌──────┐  ┌──────┐  ┌──────┐  ┌──────┐ │
│  │Hot 1 │  │Hot 2 │  │Hot 3 │  │Hot 4 │  │Hot N │ │
│  └──────┘  └──────┘  └──────┘  └──────┘  └──────┘ │
└───────────────────────────────────────────────────┘
┌───────────────────────────────────────────────────┐
│                   Warm Tier (SSD)                 │
│  ┌──────┐  ┌──────┐  ┌──────┐  ┌──────┐  ┌──────┐ │
│  │Warm 1│  │Warm 2│  │Warm 3│  │Warm 4│  │Warm N│ │
│  └──────┘  └──────┘  └──────┘  └──────┘  └──────┘ │
└───────────────────────────────────────────────────┘
┌───────────────────────────────────────────────────┐
│                  Cold/Frozen Tier                 │
│  ┌──────┐  ┌──────┐     ┌─────────────────────┐   │
│  │Cold 1│  │Cold 2│     │ Object Storage (S3) │   │
│  └──────┘  └──────┘     └─────────────────────┘   │
└───────────────────────────────────────────────────┘

Monitoring Tiered Clusters

Key Metrics

Data distribution across tiers
ILM phase transitions
Tier-specific latencies
Storage utilization per tier

Dashboard Queries

// Data per tier
GET /_cat/allocation?v&h=node,node.attr.data,disk.indices

// ILM status
GET _ilm/explain/*?only_errors=true