The "DB::Exception: Invalid shard ID" error in ClickHouse occurs when a query or operation references a shard number that does not exist in the cluster configuration. The error code is INVALID_SHARD_ID. Shard IDs in ClickHouse are 1-based integers assigned sequentially according to the order shards appear in the remote_servers configuration. Referencing an ID outside this range triggers the error.
Impact
This error prevents the affected operation from completing:
- Queries that explicitly target a shard by ID will fail
- Custom sharding logic that computes shard IDs incorrectly will produce errors
- Distributed DDL operations targeting invalid shard IDs will not execute
- It may indicate a mismatch between application logic and the actual cluster topology
Common Causes
- Shard ID out of range -- The referenced shard ID is greater than the total number of shards or less than 1.
- Cluster topology changed but application was not updated -- Shards were removed or the cluster was resized, but the application still references old shard IDs.
- Off-by-one error in custom sharding logic -- Application code that computes shard IDs using modulo arithmetic may produce 0 (invalid) instead of starting from 1.
- Mismatch between cluster definitions -- The node executing the query has a different number of shards than expected because its config is outdated.
- Hardcoded shard IDs in queries -- Queries that reference specific shard IDs (e.g., via
_shard_numvirtual column) use a value that does not exist. - Using the wrong cluster name -- The query targets a cluster with fewer shards than the one intended.
Troubleshooting and Resolution Steps
Check how many shards exist in the cluster:
SELECT cluster, max(shard_num) AS total_shards FROM system.clusters WHERE cluster = 'your_cluster' GROUP BY cluster;Valid shard IDs are 1 through the total number of shards.
Identify the invalid shard ID from the error message. The full exception text typically includes the shard ID that was referenced and the valid range.
If the issue is in a query using
_shard_num, correct the filter:-- Wrong: shard 5 doesn't exist in a 4-shard cluster SELECT * FROM distributed_table WHERE _shard_num = 5; -- Correct: use a valid shard number SELECT * FROM distributed_table WHERE _shard_num = 4;Review custom sharding logic in your application. Ensure that shard ID computation produces values in the range [1, N]:
# Wrong: produces 0-based IDs shard_id = hash(key) % num_shards # Can produce 0 # Correct: 1-based IDs shard_id = (hash(key) % num_shards) + 1Verify the cluster configuration matches expectations:
SELECT cluster, shard_num, replica_num, host_name FROM system.clusters WHERE cluster = 'your_cluster' ORDER BY shard_num, replica_num;If the cluster was resized, update all references to the old shard count. Search your application code, migration scripts, and any stored queries for hardcoded shard IDs.
Ensure all nodes have the same cluster definition to prevent different nodes from reporting different shard counts:
# Compare across nodes ssh node1 "clickhouse-client -q \"SELECT max(shard_num) FROM system.clusters WHERE cluster='your_cluster'\"" ssh node2 "clickhouse-client -q \"SELECT max(shard_num) FROM system.clusters WHERE cluster='your_cluster'\""
Best Practices
- Avoid hardcoding shard IDs in application logic. Instead, let ClickHouse handle shard routing through the sharding key in the distributed table definition.
- When using
_shard_numfor diagnostic queries, validate the shard ID againstsystem.clustersfirst. - After cluster resizing operations, audit all code paths that reference shard IDs to ensure they reflect the new topology.
- Use ClickHouse's built-in sharding expressions (e.g.,
rand(),sipHash64(key)) in distributed table definitions rather than implementing custom shard routing. - Keep cluster configurations synchronized across all nodes to prevent discrepancies in shard numbering.
Frequently Asked Questions
Q: Are shard IDs 0-based or 1-based in ClickHouse?
A: Shard IDs in ClickHouse are 1-based. The first shard defined in the configuration has ID 1, the second has ID 2, and so on. Using 0 as a shard ID will trigger the INVALID_SHARD_ID error.
Q: Can shard IDs change if I add or remove shards?
A: Shard IDs are assigned based on the order of appearance in the configuration. Adding a shard at the end will not change existing IDs. However, removing a shard from the middle or reordering the configuration will change the IDs of subsequent shards, which can break existing references.
Q: How does the sharding key in a distributed table relate to shard IDs?
A: The sharding key expression is evaluated and the result is used internally to determine which shard receives the data. ClickHouse handles the mapping between the sharding key value and the actual shard ID transparently. You do not need to compute shard IDs manually when using a properly defined sharding key.
Q: Can I query a specific shard directly without using the shard ID?
A: Yes. You can connect directly to a specific shard node and query its local table. Alternatively, you can use the _shard_num virtual column with a valid shard ID to filter results from a specific shard through the distributed table.