Meet the Pulse team at AWS re:Invent!

Read more

Elasticsearch Version Upgrade Instability

Elasticsearch version upgrades can cause temporary or persistent cluster instability. This guide helps identify and resolve common issues that occur during and after upgrades.

Pre-Upgrade Preparation

Verify Upgrade Path

Elasticsearch supports direct upgrades only between specific versions:

From Version To Version Method
7.x (latest) 8.x Rolling upgrade
7.x (older) 8.x Upgrade to 7.17 first
6.8 7.x Rolling upgrade
6.x (older) 7.x Upgrade to 6.8 first
5.x 7.x Reindex from remote

Pre-Upgrade Checks

GET /_cat/health?v
GET /_cat/nodes?v
GET /_cluster/settings
GET /_migration/deprecations

Ensure:

  • Cluster is green
  • All nodes at same version
  • No deprecated features in use

Common Upgrade Issues

Issue 1: Mixed Version Cluster Instability

Symptoms:

  • Increased latency during rolling upgrade
  • Shard allocation delays
  • Query failures

Causes:

  • Nodes running different versions
  • Version-specific optimizations disabled

Solutions:

  1. Complete the upgrade quickly:

    • Don't leave cluster in mixed-version state
    • Upgrade during low-traffic periods
  2. Disable allocation during node restart:

PUT /_cluster/settings
{
  "transient": {
    "cluster.routing.allocation.enable": "primaries"
  }
}
  1. Re-enable after node rejoins:
PUT /_cluster/settings
{
  "transient": {
    "cluster.routing.allocation.enable": "all"
  }
}

Issue 2: Node Won't Join After Upgrade

Symptoms:

  • Upgraded node doesn't appear in cluster
  • "incompatible" errors in logs

Diagnosis:

grep -i "incompatible\|version" /var/log/elasticsearch/*.log

Solutions:

  1. Check version compatibility:
GET /_cat/nodes?v&h=name,version
  1. Verify transport protocol compatibility:

    • Ensure network.host and discovery settings match
  2. Clear node state if corrupt:

# Last resort - data on this node will be lost
rm -rf /var/lib/elasticsearch/nodes/0/_state

Issue 3: Shard Recovery Delays

Symptoms:

  • Yellow cluster status persists
  • Slow shard allocation

Causes:

  • Recovery happening across version boundary
  • New segment format compatibility

Solutions:

  1. Increase recovery settings temporarily:
PUT /_cluster/settings
{
  "transient": {
    "indices.recovery.max_bytes_per_sec": "500mb",
    "cluster.routing.allocation.node_concurrent_recoveries": 4
  }
}
  1. Monitor recovery progress:
GET /_cat/recovery?v&active_only=true

Issue 4: Index Compatibility Issues

Symptoms:

  • Old indices not accessible
  • "too old" or "incompatible" errors

Diagnosis:

GET /_cat/indices?v&h=index,version.created

Solutions:

For indices from incompatible versions:

// Reindex to new index
POST _reindex
{
  "source": {"index": "old-index"},
  "dest": {"index": "new-index"}
}

// Delete old index
DELETE /old-index

Issue 5: Plugin Failures

Symptoms:

  • Node won't start
  • Plugin errors in logs

Diagnosis:

grep -i "plugin" /var/log/elasticsearch/*.log

Solutions:

  1. Remove incompatible plugins:
bin/elasticsearch-plugin remove plugin-name
  1. Install compatible versions:
bin/elasticsearch-plugin install plugin-name

Issue 6: Performance Regression

Symptoms:

  • Slower queries after upgrade
  • Higher resource usage

Causes:

  • New default settings
  • Different query execution
  • New security overhead

Solutions:

  1. Review changed defaults:

    • Check release notes for default changes
    • Explicitly set critical settings
  2. Profile slow queries:

{
  "profile": true,
  "query": {...}
}
  1. Compare settings:
GET /_cluster/settings?include_defaults=true

Rolling Upgrade Procedure

Step 1: Disable Shard Allocation

PUT /_cluster/settings
{
  "transient": {
    "cluster.routing.allocation.enable": "primaries"
  }
}

Step 2: Stop Indexing (Optional)

For critical systems, pause indexing to ensure consistency.

Step 3: Perform Sync Flush (ES 7.x)

POST /_flush/synced

Step 4: Stop and Upgrade Node

systemctl stop elasticsearch
# Install new version
rpm -Uvh elasticsearch-8.x.rpm
# or
dpkg -i elasticsearch-8.x.deb

Step 5: Start Node

systemctl start elasticsearch

Step 6: Wait for Node to Join

GET /_cat/nodes?v
GET /_cat/health?v

Step 7: Re-enable Allocation

PUT /_cluster/settings
{
  "transient": {
    "cluster.routing.allocation.enable": "all"
  }
}

Step 8: Wait for Green

GET /_cluster/health?wait_for_status=green&timeout=5m

Step 9: Repeat for Each Node

Post-Upgrade Verification

Health Checks

# Cluster health
curl -X GET "localhost:9200/_cluster/health?pretty"

# Node versions
curl -X GET "localhost:9200/_cat/nodes?v&h=name,version"

# Index health
curl -X GET "localhost:9200/_cat/indices?v&health=yellow,red"

Performance Baseline

  • Compare query latencies to pre-upgrade
  • Monitor resource usage
  • Check error rates

Rollback Planning

Before Upgrade

  1. Take snapshots:
PUT /_snapshot/backup/pre-upgrade
{
  "indices": "*",
  "include_global_state": true
}
  1. Document current configuration
  2. Keep old binaries available

If Rollback Needed

  1. Stop upgraded nodes
  2. Restore from snapshot to old version cluster
  3. Investigate issues before reattempting
Pulse - Elasticsearch Operations Done Right

Pulse can solve your Elasticsearch issues

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.