Elasticsearch High IOWait Fix

High I/O wait (iowait) occurs when the CPU is idle but waiting for disk operations to complete. In Elasticsearch, this leads to degraded performance, slow queries, and potential cluster instability. This guide provides solutions to reduce iowait.

Understanding IOWait

What is IOWait?

IOWait represents the percentage of time the CPU spends waiting for I/O operations. High iowait indicates:

Disk is the bottleneck
CPU could do more work if disk was faster
Operations are queuing for disk access

Healthy vs. Problematic Levels

IOWait %	Status	Action
< 5%	Healthy	Normal operation
5-20%	Warning	Monitor and investigate
> 20%	Critical	Immediate action needed

Diagnosing High IOWait

Check Current IOWait

# Quick check
top
# Look for "%wa" in the CPU line

# Detailed view
vmstat 1 10
# Check "wa" column

# Per-CPU breakdown
mpstat -P ALL 1 5

Identify I/O-Heavy Processes

# Show I/O by process
iotop -o

# Show disk utilization
iostat -x 1 5

Elasticsearch-Specific Checks

GET /_nodes/stats/fs
GET /_cat/thread_pool?v&h=node_name,name,active,queue&s=queue:desc
GET /_nodes/hot_threads?type=wait

Causes and Fixes

Fix 1: Upgrade to SSDs

The most impactful change for high iowait:

Before (HDD):

Random I/O: ~100-200 IOPS
Latency: 5-15ms

After (SSD):

Random I/O: 10,000-100,000+ IOPS
Latency: <1ms

# Verify disk type
cat /sys/block/sda/queue/rotational
# 1 = HDD, 0 = SSD

Fix 2: Reduce Merge Activity

Segment merging causes significant I/O:

PUT /my-index/_settings
{
  "index.merge.scheduler.max_thread_count": 1,
  "index.merge.policy.max_merged_segment": "5gb",
  "index.merge.policy.segments_per_tier": 10
}

Fix 3: Increase Refresh Interval

Reduce flush frequency:

PUT /my-index/_settings
{
  "index.refresh_interval": "30s"
}

For bulk indexing, disable refresh temporarily:

PUT /my-index/_settings
{
  "index.refresh_interval": "-1"
}

// After bulk indexing
PUT /my-index/_settings
{
  "index.refresh_interval": "1s"
}

Fix 4: Optimize Translog Settings

PUT /my-index/_settings
{
  "index.translog.durability": "async",
  "index.translog.sync_interval": "30s",
  "index.translog.flush_threshold_size": "1gb"
}

Fix 5: Ensure Adequate Filesystem Cache

High cache miss rate causes read I/O:

# Check cache usage
free -h

# Look at "buff/cache" column

Solution: Keep heap ≤ 50% of RAM to leave memory for OS cache.

Fix 6: Limit Recovery Bandwidth

During node recovery:

PUT /_cluster/settings
{
  "persistent": {
    "indices.recovery.max_bytes_per_sec": "40mb"
  }
}

Fix 7: Schedule Force Merges

Instead of continuous merging, schedule during off-peak:

POST /my-index/_forcemerge?max_num_segments=1

Run this during maintenance windows, not during peak traffic.

Fix 8: Use Multiple Data Paths

Distribute I/O across disks:

# elasticsearch.yml
path.data:
  - /mnt/disk1/elasticsearch
  - /mnt/disk2/elasticsearch

Fix 9: Disable Swap

Swap causes massive I/O issues:

# Disable swap
swapoff -a

# Or set swappiness very low
echo 1 > /proc/sys/vm/swappiness

Enable memory locking in Elasticsearch:

# elasticsearch.yml
bootstrap.memory_lock: true

Fix 10: Filesystem Optimization

# Mount with optimized options
# /etc/fstab
/dev/sda1 /data ext4 defaults,noatime,nodiratime 0 0

# For SSDs, ensure TRIM is enabled
fstrim -v /data

Index-Level Optimization

Write-Heavy Indices

PUT /logs-write-heavy
{
  "settings": {
    "index.refresh_interval": "60s",
    "index.translog.durability": "async",
    "index.translog.sync_interval": "60s",
    "index.merge.scheduler.max_thread_count": 1,
    "index.number_of_replicas": 0
  }
}

Note: Increase replicas after bulk loading.

Read-Heavy Indices

PUT /search-index
{
  "settings": {
    "index.store.type": "hybridfs",
    "index.queries.cache.enabled": true
  }
}

Monitoring IOWait

Set Up Alerts

Alert when:

IOWait > 15% for 5+ minutes
Disk utilization > 80%
I/O latency > 20ms average

Continuous Monitoring

# Log iowait every minute
while true; do
  echo "$(date): $(vmstat 1 2 | tail -1 | awk '{print $16}')" >> /var/log/iowait.log
  sleep 60
done

Elasticsearch Monitoring

GET /_cat/nodes?v&h=name,disk.used_percent,load_1m,cpu

Prevention Checklist

SSDs in use (NVMe preferred)
No swap or swap disabled
Heap ≤ 50% of RAM
Filesystem mounted with noatime
Merge threads limited
Refresh interval appropriate
Recovery bandwidth limited
Monitoring and alerting configured

Quick Fixes During High IOWait

If experiencing high iowait right now:

// 1. Reduce indexing rate (inform clients)

// 2. Disable refresh temporarily
PUT /*/_settings
{
  "index.refresh_interval": "-1"
}

// 3. Stop non-essential recoveries
PUT /_cluster/settings
{
  "transient": {
    "cluster.routing.allocation.enable": "none"
  }
}

// 4. After stabilization, re-enable
PUT /*/_settings
{
  "index.refresh_interval": "30s"
}

PUT /_cluster/settings
{
  "transient": {
    "cluster.routing.allocation.enable": "all"
  }
}