The "DB::Exception: Too many unavailable shards" error in ClickHouse is raised when a distributed query finds that the number of unreachable shards exceeds the acceptable threshold. The error code is TOO_MANY_UNAVAILABLE_SHARDS. While skip_unavailable_shards allows ClickHouse to tolerate some shard failures, this error indicates that too many shards are down for the query to produce meaningful results.
Impact
This error blocks the distributed query entirely and signals a serious cluster health problem:
- Queries return no results rather than partial results
- The cluster is in a degraded state where a significant fraction of data is inaccessible
- Applications and dashboards relying on the distributed table will experience outages
- It may indicate a widespread infrastructure failure requiring urgent attention
Common Causes
- Multiple shard nodes are down simultaneously -- Hardware failures, widespread OS issues, or a failed deployment took out several shards at once.
- Network partition affecting multiple shards -- A switch, router, or subnet failure cuts off a group of shard nodes from the initiator.
- Rolling restart executed too aggressively -- Too many shards were restarted simultaneously without waiting for recovery.
- Cloud provider availability zone outage -- If shards are concentrated in one zone and that zone goes down, many shards become unavailable at once.
- Resource exhaustion across multiple nodes -- Out-of-memory conditions, disk full errors, or thread exhaustion on several shard nodes simultaneously.
- Configuration error affecting multiple shards -- A bad config change deployed to many nodes can cause them all to fail to start.
Troubleshooting and Resolution Steps
Determine how many shards are unavailable:
SELECT cluster, shard_num, replica_num, host_name, port FROM system.clusters WHERE cluster = 'your_cluster' ORDER BY shard_num;Then test connectivity to each shard:
for host in node1 node2 node3 node4; do echo -n "$host: " nc -zv $host 9000 2>&1 | tail -1 doneCheck the status of ClickHouse on each shard node:
ssh <shard_host> "systemctl status clickhouse-server"Review system logs on failed nodes for common failures:
ssh <shard_host> "tail -100 /var/log/clickhouse-server/clickhouse-server.err.log" ssh <shard_host> "dmesg | tail -50" # Check for OOM killsIf nodes are down due to configuration errors, fix and restart:
# Validate the config first clickhouse-server --config-file /etc/clickhouse-server/config.xml --check-config # Then restart systemctl restart clickhouse-serverWhile restoring shards, temporarily lower query expectations if partial results are acceptable:
SET skip_unavailable_shards = 1; -- Query will run on whatever shards are available SELECT count() FROM your_distributed_table;Note that this will only work if the number of available shards meets the internal threshold.
Prioritize bringing shards back online -- focus on one shard at a time, ensuring each is fully operational before moving to the next.
If this is a recurring issue during maintenance, implement a proper rolling restart procedure:
# Example: restart one shard at a time, waiting for health check for shard_host in node1 node2 node3 node4; do ssh $shard_host "systemctl restart clickhouse-server" # Wait for the shard to be ready until ssh $shard_host "clickhouse-client -q 'SELECT 1'" 2>/dev/null; do sleep 5 done echo "$shard_host is back online" done
Best Practices
- Distribute shards across multiple availability zones or failure domains so that a single infrastructure event cannot take out too many shards.
- Use replication within each shard so that individual node failures do not make the shard unavailable.
- Implement automated health checks and alerting that trigger before the number of unavailable shards reaches a critical threshold.
- Plan maintenance windows carefully and use rolling restarts that keep the majority of shards available at all times.
- Monitor cloud provider status pages and set up alerts for availability zone issues.
- Maintain runbooks for rapid shard recovery to minimize the duration of partial outages.
Frequently Asked Questions
Q: How does ClickHouse decide when "too many" shards are unavailable?
A: When skip_unavailable_shards is enabled, ClickHouse will attempt to run the query on available shards. If all shards are unavailable, or if the query cannot proceed with the available subset (for certain query types), this error is raised. The exact threshold depends on the query and cluster configuration.
Q: Is skip_unavailable_shards enough to prevent this error?
A: It helps, but if every single shard is unreachable, the query still cannot proceed. The setting is designed for tolerating partial failures, not total cluster unavailability.
Q: Can I set a minimum number of shards required for a query to succeed?
A: There is no direct setting for this. The behavior is governed by skip_unavailable_shards and the query's requirements. You can implement application-level checks by querying system.clusters and verifying shard availability before issuing the main query.
Q: How do I prevent this during planned maintenance?
A: Use a rolling restart approach where you restart one shard (or one replica per shard) at a time. Wait for each node to fully rejoin the cluster before proceeding to the next. This keeps the majority of shards available throughout the maintenance window.
Q: Does this error affect INSERT operations to distributed tables?
A: Distributed INSERTs may also fail if the target shard is unreachable. However, if using asynchronous inserts (the default for distributed tables), the data will be buffered locally and sent when the shard comes back online, subject to the pending bytes limit.