NEW

Pulse 2025 Product Roundup: From Monitoring to AI-Native Control Plane

ClickHouse MultiDisk JBOD Balancing: round_robin vs least_used

ClickHouse storage policies allow multiple disks per volume, commonly known as JBOD ("just a bunch of disks"). When parts land in such a volume, ClickHouse must pick one disk per write. Two strategies are available: round_robin (the default) and least_used. Picking the right one, and knowing how to manually rebalance afterwards, prevents the classic problem of one disk filling up while the others sit half empty.

Round Robin (Default)

round_robin cycles through the disks in the configured order. Each new part goes to the next disk. It is most effective when parts created on insert are roughly the same size. If insert sizes vary widely, disk utilization drifts apart over time. Once a disk falls behind, round robin will not catch it up, since each disk receives one part per cycle regardless of its current state.

Least Used

least_used selects the disk with the most free space for each write. It is the right choice when:

  • The JBOD volume contains disks of different sizes.
  • Insert sizes vary, leading to skew under round robin.
  • You added a new (empty) disk and want it to catch up to the others.

The trade-off is that all concurrent writes target the same "freest" disk for a short window, so it works best on a fresh volume or one that has already been rebalanced.

Configuration Example

<clickhouse>
  <storage_configuration>
    <policies>
      <hot>
        <volumes>
          <default>
            <disk>disk1</disk>
            <disk>disk2</disk>
            <load_balancing>least_used</load_balancing>
            <least_used_ttl_ms>60000</least_used_ttl_ms>
          </default>
        </volumes>
      </hot>
    </policies>
  </storage_configuration>
</clickhouse>

Key Settings

Setting Purpose
load_balancing round_robin or least_used
least_used_ttl_ms How long the "least used disk" decision is cached (default 60000)
keep_free_space_bytes Reserve absolute free space per disk
keep_free_space_ratio Reserve a fraction of total disk space
min_bytes_to_rebalance_partition_over_jbod Threshold for redistributing parts during merges

least_used_ttl_ms matters because querying free space on every write would be expensive. With the default 60 second TTL, all writes within that window go to whichever disk was freest at the start. Lowering it improves balance at the cost of more statfs calls. Raising it risks burst-loading one disk.

min_bytes_to_rebalance_partition_over_jbod controls rebalancing during merges, not insert placement. It is the only knob that moves data after the fact without manual intervention.

Manual Rebalancing

If a volume is already imbalanced, switching to least_used will not move existing parts. To rebalance, identify large parts on the fullest disk and move them to less full disks within the same volume:

SELECT
    database,
    table,
    name,
    disk_name,
    formatReadableSize(bytes_on_disk) AS size
FROM system.parts
WHERE active AND disk_name = 'disk1'
ORDER BY bytes_on_disk DESC
LIMIT 20;

Then move parts to another disk in the same volume:

ALTER TABLE db.table MOVE PART '202401_1_1_0' TO DISK 'disk2';

Repeat until disks converge. Combine with the system.disks view to check progress:

SELECT name, formatReadableSize(free_space) AS free,
       formatReadableSize(total_space) AS total
FROM system.disks
ORDER BY name;

Common Pitfalls

  • Switching to least_used on an imbalanced volume and expecting automatic catch-up. Existing parts stay put; only new writes go to the freest disk.
  • Setting least_used_ttl_ms to a very small value, causing thrashing in the disk choice for every part.
  • Forgetting keep_free_space_bytes. Without a reserve, disks can fill to 100%, blocking merges and breaking the server.
  • Mixing disks with very different IOPS in the same volume. Slow disks become the bottleneck under least_used because they fill last and keep attracting writes.

Frequently Asked Questions

Q: Which load balancing strategy should I use? A: least_used for heterogeneous disks, varied insert sizes, or after adding a disk. round_robin is fine for fresh, equal-sized disks with uniform insert sizes.

Q: Does ClickHouse rebalance automatically over time? A: Only partially, through min_bytes_to_rebalance_partition_over_jbod during merges. For meaningful rebalancing, use ALTER TABLE ... MOVE PART.

Q: Can I change load_balancing without restarting? A: Storage configuration changes are picked up dynamically in recent versions. Restart only if system.storage_policies does not reflect the new value.

Q: Will least_used always target the same disk? A: For up to least_used_ttl_ms milliseconds at a time, yes. After the TTL expires the choice is recomputed.

Q: Should I use least_used with cloud disks? A: For local NVMe in cloud VMs, yes. For network-attached storage where all "disks" share a pool, the choice rarely matters.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.