NEW

Pulse 2025 Product Roundup: From Monitoring to AI-Native Control Plane

ClickHouse DB::Exception: No available replica

The "DB::Exception: No available replica" error in ClickHouse occurs when the server cannot find any healthy replica to handle a query against a distributed or replicated table. The error code is NO_AVAILABLE_REPLICA. ClickHouse evaluates replica health based on factors like connection state, replication delay, and error history -- and this error means every replica has been deemed unavailable.

Impact

This error blocks the query from executing, which can lead to:

  • Failed SELECT queries that require data from the affected replicated table
  • Application-level errors and timeouts
  • Disrupted analytical workloads and reporting dashboards
  • Potential data insertion failures when targeting replicated tables

Common Causes

  1. All replicas are down or unreachable -- Every ClickHouse instance hosting the replicated table has stopped or cannot be contacted over the network.
  2. All replicas exceed the staleness threshold -- Combined with load balancing policies that respect max_replica_delay_for_distributed_queries, every replica may be excluded for being too far behind.
  3. Connection pool exhaustion -- All available connections to replicas are in use and no new connections can be created.
  4. ZooKeeper session expiration -- Replicas have lost their ZooKeeper sessions, causing them to be marked as inactive.
  5. Load balancing policy excludes all replicas -- Certain load balancing modes (e.g., nearest_hostname or in_order) combined with health checks can result in no replicas being eligible.
  6. Replicas in read-only mode -- If all replicas have entered read-only mode (e.g., due to disk space issues), they may be considered unavailable for certain operations.

Troubleshooting and Resolution Steps

  1. Check the status of all replicas for the table:

    SELECT
        replica_name,
        is_leader,
        can_become_leader,
        is_readonly,
        is_session_expired,
        absolute_delay,
        queue_size
    FROM system.replicas
    WHERE database = 'your_db' AND table = 'your_table';
    

    Pay attention to is_readonly and is_session_expired columns.

  2. Verify ZooKeeper / ClickHouse Keeper connectivity:

    SELECT * FROM system.zookeeper WHERE path = '/';
    

    If this query fails, the coordination service is unreachable, which will cause replicas to lose their sessions.

  3. Check the current load balancing setting:

    SELECT name, value FROM system.settings WHERE name = 'load_balancing';
    

    Try switching to a different policy temporarily to see if that resolves the issue:

    SET load_balancing = 'random';
    
  4. Look for replicas stuck in read-only mode:

    SELECT replica_name, is_readonly, readonly_reason
    FROM system.replicas
    WHERE is_readonly = 1;
    

    Read-only mode often indicates disk space issues or ZooKeeper connectivity problems.

  5. Review disk space on replica nodes:

    SELECT name, path, free_space, total_space
    FROM system.disks;
    

    If free space is critically low, ClickHouse may put the replica into read-only mode.

  6. Restart the replica's ZooKeeper session if it has expired:

    SYSTEM RESTART REPLICA your_db.your_table;
    

    This forces the replica to re-establish its ZooKeeper session.

  7. If the issue is persistent, consider adjusting failover settings:

    SET fallback_to_stale_replicas_for_distributed_queries = 1;
    SET max_replica_delay_for_distributed_queries = 600;
    

Best Practices

  • Run multiple replicas per table and distribute them across different physical hosts or availability zones.
  • Monitor the system.replicas table continuously for read-only state, session expiration, and replication delay.
  • Configure appropriate disk space alerts to prevent replicas from entering read-only mode unexpectedly.
  • Keep ZooKeeper or ClickHouse Keeper properly sized with enough memory and low-latency storage.
  • Use the random load balancing policy as a sensible default -- it distributes load evenly and handles failover well.
  • Set fallback_to_stale_replicas_for_distributed_queries = 1 when freshness is less critical than availability.

Frequently Asked Questions

Q: Is NO_AVAILABLE_REPLICA the same as ALL_REPLICAS_LOST?
A: No. NO_AVAILABLE_REPLICA means no replica was deemed healthy enough to accept the query at the time of evaluation. ALL_REPLICAS_LOST indicates that connections to replicas were established but then lost during query execution. The former is a pre-query check failure; the latter is a mid-query failure.

Q: Can a single-node ClickHouse setup produce this error?
A: It can occur if you are querying a Replicated table engine and the local replica itself is in a degraded state (e.g., ZooKeeper session expired or the node is in read-only mode).

Q: How does ClickHouse determine if a replica is "available"?
A: ClickHouse considers connection state, the load balancing policy, replication delay relative to max_replica_delay_for_distributed_queries, and whether the replica is marked as read-only or session-expired. A replica must pass all of these checks to be considered available.

Q: Does the order of replicas in the configuration matter?
A: It depends on the load balancing policy. With in_order, the first replica is preferred. With random or round_robin, the order in the config has no practical effect. With first_or_random, the first replica is tried first, then others are selected randomly.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.