Meet the Pulse team at AWS re:Invent!

Read more

Elasticsearch Scaling Nodes with Hot-Warm Architecture

Hot-warm-cold architecture allows you to optimize Elasticsearch costs and performance by placing data on appropriate hardware tiers based on access patterns and age.

Understanding Data Tiers

Tier Characteristics

Tier Hardware Data Age Access Pattern Purpose
Hot NVMe/SSD Recent Frequent read/write Active indexing and search
Warm SSD/HDD Days-weeks Occasional read Historical search
Cold HDD/Object Weeks-months Rare read Compliance, analytics
Frozen Object storage Months+ Very rare Long-term retention

Cost Savings

Typical cost reduction: 40-70% compared to all-hot architecture.

Setting Up Node Tiers

Node Configuration

Hot Node:

# elasticsearch.yml
node.roles: [data_hot, data_content]
node.attr.data: hot

Warm Node:

# elasticsearch.yml
node.roles: [data_warm]
node.attr.data: warm

Cold Node:

# elasticsearch.yml
node.roles: [data_cold]
node.attr.data: cold

Frozen Node (8.x):

# elasticsearch.yml
node.roles: [data_frozen]

Hardware Recommendations

Tier CPU RAM Storage Example Instance
Hot High 64-128 GB NVMe SSD r5.4xlarge, i3.2xlarge
Warm Medium 32-64 GB SSD r5.2xlarge, d2.xlarge
Cold Low 16-32 GB HDD d2.xlarge, i3en.xlarge
Frozen Minimal 8-16 GB Object Small instance + S3

Index Lifecycle Management (ILM)

Create ILM Policy

PUT _ilm/policy/hot_warm_cold_policy
{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "rollover": {
            "max_primary_shard_size": "50gb",
            "max_age": "1d"
          },
          "set_priority": {
            "priority": 100
          }
        }
      },
      "warm": {
        "min_age": "7d",
        "actions": {
          "shrink": {
            "number_of_shards": 1
          },
          "forcemerge": {
            "max_num_segments": 1
          },
          "allocate": {
            "require": {
              "data": "warm"
            }
          },
          "set_priority": {
            "priority": 50
          }
        }
      },
      "cold": {
        "min_age": "30d",
        "actions": {
          "allocate": {
            "require": {
              "data": "cold"
            }
          },
          "set_priority": {
            "priority": 0
          }
        }
      },
      "delete": {
        "min_age": "90d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

Using Data Tiers (Preferred in 8.x)

PUT _ilm/policy/data_tiers_policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_primary_shard_size": "50gb"
          }
        }
      },
      "warm": {
        "min_age": "7d",
        "actions": {
          "shrink": {
            "number_of_shards": 1
          },
          "forcemerge": {
            "max_num_segments": 1
          }
        }
      },
      "cold": {
        "min_age": "30d",
        "actions": {}
      },
      "frozen": {
        "min_age": "60d",
        "actions": {
          "searchable_snapshot": {
            "snapshot_repository": "my_repository"
          }
        }
      },
      "delete": {
        "min_age": "365d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

Apply Policy to Index Template

PUT _index_template/logs_template
{
  "index_patterns": ["logs-*"],
  "data_stream": {},
  "template": {
    "settings": {
      "index.lifecycle.name": "hot_warm_cold_policy",
      "index.number_of_shards": 2,
      "index.number_of_replicas": 1
    }
  }
}

Scaling Strategies

Horizontal Scaling (Add Nodes)

When to add hot nodes:

  • Indexing latency increasing
  • Search latency on recent data degrading
  • CPU/memory at capacity

When to add warm/cold nodes:

  • Storage filling up
  • Historical search queries slow
  • Data retention needs increasing

Vertical Scaling (Upgrade Nodes)

When to upgrade:

  • Single index performance critical
  • Network becoming bottleneck
  • Easier than managing more nodes

Scaling Checklist

Before scaling:
□ Current resource utilization measured
□ Growth rate calculated
□ Bottleneck identified (CPU/memory/disk/network)
□ ILM policy optimized

After scaling:
□ Rebalancing complete
□ Performance improved
□ Monitoring updated
□ Cost impact evaluated

Data Movement

Monitor Tier Distribution

GET /_cat/allocation?v&h=node,node.attr.data,disk.used,disk.percent
GET /_cat/indices?v&h=index,store.size,pri.store.size

Check ILM Progress

GET /_ilm/explain/*
GET /_cat/indices?v&h=index,ilm.phase,ilm.action,ilm.step

Force Data Movement

// Move index to warm tier
PUT /my-index/_settings
{
  "index.routing.allocation.require.data": "warm"
}

Searchable Snapshots (Frozen Tier)

Setup Repository

PUT _snapshot/my_repository
{
  "type": "s3",
  "settings": {
    "bucket": "my-elasticsearch-snapshots",
    "region": "us-east-1"
  }
}

Mount Searchable Snapshot

POST /_snapshot/my_repository/snapshot_1/_mount?wait_for_completion=true
{
  "index": "my-old-index",
  "renamed_index": "my-old-index-searchable"
}

Storage Comparison

Tier Storage Cost Search Performance
Hot $$$ Fastest
Warm $$ Fast
Cold $ Moderate
Frozen ¢ Slower (fetches from object storage)

Architecture Patterns

Small Deployment (< 500 GB/day)

┌─────────────────────────────────────────┐
│         Hot/Warm Combined Nodes         │
│  ┌─────────┐  ┌─────────┐  ┌─────────┐  │
│  │  Node 1 │  │  Node 2 │  │  Node 3 │  │
│  │  SSD    │  │  SSD    │  │  SSD    │  │
│  └─────────┘  └─────────┘  └─────────┘  │
└─────────────────────────────────────────┘

Medium Deployment (500 GB - 5 TB/day)

┌─────────────────────────────────────────┐
│              Hot Tier (NVMe)            │
│  ┌─────────┐  ┌─────────┐  ┌─────────┐  │
│  │  Hot 1  │  │  Hot 2  │  │  Hot 3  │  │
│  └─────────┘  └─────────┘  └─────────┘  │
└─────────────────────────────────────────┘
┌─────────────────────────────────────────┐
│              Warm Tier (SSD)            │
│  ┌─────────┐  ┌─────────┐  ┌─────────┐  │
│  │ Warm 1  │  │ Warm 2  │  │ Warm 3  │  │
│  └─────────┘  └─────────┘  └─────────┘  │
└─────────────────────────────────────────┘

Large Deployment (> 5 TB/day)

┌───────────────────────────────────────────────────┐
│                   Hot Tier (NVMe)                 │
│  ┌──────┐  ┌──────┐  ┌──────┐  ┌──────┐  ┌──────┐ │
│  │Hot 1 │  │Hot 2 │  │Hot 3 │  │Hot 4 │  │Hot N │ │
│  └──────┘  └──────┘  └──────┘  └──────┘  └──────┘ │
└───────────────────────────────────────────────────┘
┌───────────────────────────────────────────────────┐
│                   Warm Tier (SSD)                 │
│  ┌──────┐  ┌──────┐  ┌──────┐  ┌──────┐  ┌──────┐ │
│  │Warm 1│  │Warm 2│  │Warm 3│  │Warm 4│  │Warm N│ │
│  └──────┘  └──────┘  └──────┘  └──────┘  └──────┘ │
└───────────────────────────────────────────────────┘
┌───────────────────────────────────────────────────┐
│                  Cold/Frozen Tier                 │
│  ┌──────┐  ┌──────┐     ┌─────────────────────┐   │
│  │Cold 1│  │Cold 2│     │ Object Storage (S3) │   │
│  └──────┘  └──────┘     └─────────────────────┘   │
└───────────────────────────────────────────────────┘

Monitoring Tiered Clusters

Key Metrics

  • Data distribution across tiers
  • ILM phase transitions
  • Tier-specific latencies
  • Storage utilization per tier

Dashboard Queries

// Data per tier
GET /_cat/allocation?v&h=node,node.attr.data,disk.indices

// ILM status
GET _ilm/explain/*?only_errors=true
Pulse - Elasticsearch Operations Done Right

Pulse can solve your Elasticsearch issues

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.