The "DB::Exception: Shard has no connections" error in ClickHouse occurs when the server cannot find or establish any usable connection to a particular shard in a distributed query. The error code is SHARD_HAS_NO_CONNECTIONS. This differs from a general connection failure in that it specifically indicates the connection pool for a shard is empty -- there are no connections available, whether idle or newly created.
Impact
When a shard has no connections, any distributed query that needs data from that shard will fail. This leads to:
- Failed queries that span the affected shard
- Blocked distributed INSERTs targeting that shard
- Possible data pipeline stalls if the shard handles a critical portion of the data
- Degraded cluster availability until the shard becomes reachable again
Common Causes
- All replicas of the shard are offline -- Every ClickHouse instance serving the shard is stopped, crashed, or unreachable.
- Network partitioning -- The initiator node is isolated from the shard's replicas by a network-level issue.
- Misconfigured shard definition -- The
remote_serversconfiguration references hosts or ports that do not correspond to running ClickHouse instances. - Connection pool not yet initialized -- In rare cases after a cold start, the pool for a shard may not have been established before the first query arrives.
- Authentication failures on all replicas -- If credentials configured for inter-cluster communication are wrong, no connections can be created.
- Port conflicts or binding issues -- The remote ClickHouse server is listening on a different port than what the cluster configuration specifies.
Troubleshooting and Resolution Steps
Identify which shard is affected by inspecting the cluster configuration:
SELECT cluster, shard_num, replica_num, host_name, port FROM system.clusters WHERE cluster = 'your_cluster';Cross-reference the shard number from the error message with this output.
Test connectivity to each replica of the affected shard:
clickhouse-client --host <replica_host> --port 9000 --query "SELECT 1"If this fails, the replica is not reachable or not running.
Check the ClickHouse process on each replica node:
systemctl status clickhouse-server ps aux | grep clickhouseVerify authentication credentials for inter-node communication: Check that the
<user>and<password>elements in the<remote_servers>configuration match the user defined on the remote nodes:<remote_servers> <your_cluster> <shard> <replica> <host>node1</host> <port>9000</port> <user>default</user> <password>your_password</password> </replica> </shard> </your_cluster> </remote_servers>Review the error log on the initiator node for more detailed connection failure reasons:
grep -i "connection\|SHARD_HAS_NO_CONNECTIONS" /var/log/clickhouse-server/clickhouse-server.logIf the shard is intentionally decommissioned, update the cluster configuration to remove it and reload:
SYSTEM RELOAD CONFIG;
Best Practices
- Always provision at least two replicas per shard to avoid single points of failure.
- Use configuration management (Ansible, Puppet, etc.) to ensure cluster definitions stay consistent across all nodes.
- Monitor connection pool health by tracking distributed query errors in
system.errorsand setting up alerts. - Validate cluster configuration changes in a staging environment before applying them to production.
- Document the mapping between shard numbers and physical hosts for faster incident response.
- Test inter-node authentication after credential rotations to catch mismatches before they affect production.
Frequently Asked Questions
Q: How is SHARD_HAS_NO_CONNECTIONS different from ALL_CONNECTION_TRIES_FAILED?
A: SHARD_HAS_NO_CONNECTIONS means there are no connections available in the pool for the shard at all. ALL_CONNECTION_TRIES_FAILED means ClickHouse actively attempted to connect and each attempt was rejected or timed out. The former can occur when the pool is simply empty, while the latter reflects active connection failures.
Q: Can this error appear for just one shard while others work fine?
A: Absolutely. Each shard has its own set of replicas and connection state. If only one shard's replicas are unreachable, only queries touching that shard will fail.
Q: Will skip_unavailable_shards help with this error?
A: Yes. If you set skip_unavailable_shards = 1, ClickHouse will skip the unreachable shard and return results from the remaining shards. The results will be incomplete, however.
Q: What port should I check when debugging this error?
A: Check the native protocol port, which defaults to 9000 (or 9440 for TLS). The HTTP port (8123) is not used for inter-node distributed queries.