ClickHouse DB::Exception: No available replica

Q: Is NO_AVAILABLE_REPLICA the same as ALL_REPLICAS_LOST?

No. NO_AVAILABLE_REPLICA means no replica was deemed healthy enough to accept the query at the time of evaluation. ALL_REPLICAS_LOST indicates that connections to replicas were established but then lost during query execution. The former is a pre-query check failure; the latter is a mid-query failure.

Q: Does the order of replicas in the configuration matter?

It depends on the load balancing policy. With in_order , the first replica is preferred. With random or round_robin , the order in the config has no practical effect. With first_or_random , the first replica is tried first, then others are selected randomly.

The "DB::Exception: No available replica" error in ClickHouse occurs when the server cannot find any healthy replica to handle a query against a distributed or replicated table. The error code is NO_AVAILABLE_REPLICA. ClickHouse evaluates replica health based on factors like connection state, replication delay, and error history -- and this error means every replica has been deemed unavailable.

Impact

This error blocks the query from executing, which can lead to:

Failed SELECT queries that require data from the affected replicated table
Application-level errors and timeouts
Disrupted analytical workloads and reporting dashboards
Potential data insertion failures when targeting replicated tables

Common Causes

All replicas are down or unreachable -- Every ClickHouse instance hosting the replicated table has stopped or cannot be contacted over the network.
All replicas exceed the staleness threshold -- Combined with load balancing policies that respect max_replica_delay_for_distributed_queries, every replica may be excluded for being too far behind.
Connection pool exhaustion -- All available connections to replicas are in use and no new connections can be created.
ZooKeeper session expiration -- Replicas have lost their ZooKeeper sessions, causing them to be marked as inactive.
Load balancing policy excludes all replicas -- Certain load balancing modes (e.g., nearest_hostname or in_order) combined with health checks can result in no replicas being eligible.
Replicas in read-only mode -- If all replicas have entered read-only mode (e.g., due to disk space issues), they may be considered unavailable for certain operations.

Troubleshooting and Resolution Steps

Check the status of all replicas for the table:

SELECT
    replica_name,
    is_leader,
    can_become_leader,
    is_readonly,
    is_session_expired,
    absolute_delay,
    queue_size
FROM system.replicas
WHERE database = 'your_db' AND table = 'your_table';

Pay attention to is_readonly and is_session_expired columns.

Verify ZooKeeper / ClickHouse Keeper connectivity:
```
SELECT * FROM system.zookeeper WHERE path = '/';
```
If this query fails, the coordination service is unreachable, which will cause replicas to lose their sessions.
Check the current load balancing setting:
```
SELECT name, value FROM system.settings WHERE name = 'load_balancing';
```
Try switching to a different policy temporarily to see if that resolves the issue:
```
SET load_balancing = 'random';
```
Look for replicas stuck in read-only mode:
```
SELECT replica_name, is_readonly, readonly_reason
FROM system.replicas
WHERE is_readonly = 1;
```
Read-only mode often indicates disk space issues or ZooKeeper connectivity problems.
Review disk space on replica nodes:
```
SELECT name, path, free_space, total_space
FROM system.disks;
```
If free space is critically low, ClickHouse may put the replica into read-only mode.
Restart the replica's ZooKeeper session if it has expired:
```
SYSTEM RESTART REPLICA your_db.your_table;
```
This forces the replica to re-establish its ZooKeeper session.

If the issue is persistent, consider adjusting failover settings:

SET fallback_to_stale_replicas_for_distributed_queries = 1;
SET max_replica_delay_for_distributed_queries = 600;

Best Practices

Run multiple replicas per table and distribute them across different physical hosts or availability zones.
Monitor the system.replicas table continuously for read-only state, session expiration, and replication delay.
Configure appropriate disk space alerts to prevent replicas from entering read-only mode unexpectedly.
Keep ZooKeeper or ClickHouse Keeper properly sized with enough memory and low-latency storage.
Use the random load balancing policy as a sensible default -- it distributes load evenly and handles failover well.
Set fallback_to_stale_replicas_for_distributed_queries = 1 when freshness is less critical than availability.

Frequently Asked Questions

Q: Is NO_AVAILABLE_REPLICA the same as ALL_REPLICAS_LOST?
A: No. NO_AVAILABLE_REPLICA means no replica was deemed healthy enough to accept the query at the time of evaluation. ALL_REPLICAS_LOST indicates that connections to replicas were established but then lost during query execution. The former is a pre-query check failure; the latter is a mid-query failure.

Q: Can a single-node ClickHouse setup produce this error?
A: It can occur if you are querying a Replicated table engine and the local replica itself is in a degraded state (e.g., ZooKeeper session expired or the node is in read-only mode).

Q: How does ClickHouse determine if a replica is "available"?
A: ClickHouse considers connection state, the load balancing policy, replication delay relative to max_replica_delay_for_distributed_queries, and whether the replica is marked as read-only or session-expired. A replica must pass all of these checks to be considered available.

Q: Does the order of replicas in the configuration matter?
A: It depends on the load balancing policy. With in_order, the first replica is preferred. With random or round_robin, the order in the config has no practical effect. With first_or_random, the first replica is tried first, then others are selected randomly.