The "DB::Exception: No available replica" error in ClickHouse occurs when the server cannot find any healthy replica to handle a query against a distributed or replicated table. The error code is NO_AVAILABLE_REPLICA. ClickHouse evaluates replica health based on factors like connection state, replication delay, and error history -- and this error means every replica has been deemed unavailable.
Impact
This error blocks the query from executing, which can lead to:
- Failed SELECT queries that require data from the affected replicated table
- Application-level errors and timeouts
- Disrupted analytical workloads and reporting dashboards
- Potential data insertion failures when targeting replicated tables
Common Causes
- All replicas are down or unreachable -- Every ClickHouse instance hosting the replicated table has stopped or cannot be contacted over the network.
- All replicas exceed the staleness threshold -- Combined with load balancing policies that respect
max_replica_delay_for_distributed_queries, every replica may be excluded for being too far behind. - Connection pool exhaustion -- All available connections to replicas are in use and no new connections can be created.
- ZooKeeper session expiration -- Replicas have lost their ZooKeeper sessions, causing them to be marked as inactive.
- Load balancing policy excludes all replicas -- Certain load balancing modes (e.g.,
nearest_hostnameorin_order) combined with health checks can result in no replicas being eligible. - Replicas in read-only mode -- If all replicas have entered read-only mode (e.g., due to disk space issues), they may be considered unavailable for certain operations.
Troubleshooting and Resolution Steps
Check the status of all replicas for the table:
SELECT replica_name, is_leader, can_become_leader, is_readonly, is_session_expired, absolute_delay, queue_size FROM system.replicas WHERE database = 'your_db' AND table = 'your_table';Pay attention to
is_readonlyandis_session_expiredcolumns.Verify ZooKeeper / ClickHouse Keeper connectivity:
SELECT * FROM system.zookeeper WHERE path = '/';If this query fails, the coordination service is unreachable, which will cause replicas to lose their sessions.
Check the current load balancing setting:
SELECT name, value FROM system.settings WHERE name = 'load_balancing';Try switching to a different policy temporarily to see if that resolves the issue:
SET load_balancing = 'random';Look for replicas stuck in read-only mode:
SELECT replica_name, is_readonly, readonly_reason FROM system.replicas WHERE is_readonly = 1;Read-only mode often indicates disk space issues or ZooKeeper connectivity problems.
Review disk space on replica nodes:
SELECT name, path, free_space, total_space FROM system.disks;If free space is critically low, ClickHouse may put the replica into read-only mode.
Restart the replica's ZooKeeper session if it has expired:
SYSTEM RESTART REPLICA your_db.your_table;This forces the replica to re-establish its ZooKeeper session.
If the issue is persistent, consider adjusting failover settings:
SET fallback_to_stale_replicas_for_distributed_queries = 1; SET max_replica_delay_for_distributed_queries = 600;
Best Practices
- Run multiple replicas per table and distribute them across different physical hosts or availability zones.
- Monitor the
system.replicastable continuously for read-only state, session expiration, and replication delay. - Configure appropriate disk space alerts to prevent replicas from entering read-only mode unexpectedly.
- Keep ZooKeeper or ClickHouse Keeper properly sized with enough memory and low-latency storage.
- Use the
randomload balancing policy as a sensible default -- it distributes load evenly and handles failover well. - Set
fallback_to_stale_replicas_for_distributed_queries = 1when freshness is less critical than availability.
Frequently Asked Questions
Q: Is NO_AVAILABLE_REPLICA the same as ALL_REPLICAS_LOST?
A: No. NO_AVAILABLE_REPLICA means no replica was deemed healthy enough to accept the query at the time of evaluation. ALL_REPLICAS_LOST indicates that connections to replicas were established but then lost during query execution. The former is a pre-query check failure; the latter is a mid-query failure.
Q: Can a single-node ClickHouse setup produce this error?
A: It can occur if you are querying a Replicated table engine and the local replica itself is in a degraded state (e.g., ZooKeeper session expired or the node is in read-only mode).
Q: How does ClickHouse determine if a replica is "available"?
A: ClickHouse considers connection state, the load balancing policy, replication delay relative to max_replica_delay_for_distributed_queries, and whether the replica is marked as read-only or session-expired. A replica must pass all of these checks to be considered available.
Q: Does the order of replicas in the configuration matter?
A: It depends on the load balancing policy. With in_order, the first replica is preferred. With random or round_robin, the order in the config has no practical effect. With first_or_random, the first replica is tried first, then others are selected randomly.