The "DB::Exception: Distributed table too many pending bytes" error in ClickHouse occurs when the volume of data queued for asynchronous delivery to remote shards exceeds the configured limit. The error code is DISTRIBUTED_TOO_MANY_PENDING_BYTES. When you INSERT into a distributed table, ClickHouse writes the data to a local buffer directory and sends it to the target shards asynchronously. If the sending process falls behind, the pending data accumulates until this threshold is reached.
Impact
This error blocks further INSERT operations into the affected distributed table, which can cause:
- Data ingestion pipeline failures and backpressure
- Growing data loss risk if upstream systems discard data they cannot deliver
- Disk space consumption on the initiator node from accumulated pending files
- A clear signal that the cluster's write throughput is insufficient for the current load
Common Causes
- Target shards are unreachable or slow -- Network issues or overloaded shards prevent timely delivery of buffered data.
- High insert throughput exceeding shard capacity -- The rate of incoming data surpasses what the distributed sending mechanism can flush to remote shards.
- Distributed directory monitor is too slow -- The background process that ships pending data is not running frequently enough or is bottlenecked.
- Disk I/O contention on the initiator node -- The node buffering the data has slow disk performance, slowing both writes and reads of pending files.
- The
max_bytes_to_distributelimit is set too low -- The configured threshold does not accommodate normal burst patterns. - Remote shards rejecting inserts -- Tables in read-only mode, schema mismatches, or quota limits on the shard side prevent data from being accepted.
Troubleshooting and Resolution Steps
Check the current pending bytes for distributed tables:
SELECT database, table, bytes_to_send, files_to_send FROM system.distribution_queue ORDER BY bytes_to_send DESC;This shows how much data is queued for each distributed table.
Verify connectivity to target shards:
SELECT cluster, shard_num, replica_num, host_name, port FROM system.clusters WHERE cluster = 'your_cluster';Then test connectivity to each shard:
clickhouse-client --host <shard_host> --port 9000 --query "SELECT 1"Check for errors in the distribution queue:
SELECT database, table, error_count, last_error FROM system.distribution_queue WHERE error_count > 0;Increase the pending bytes limit if the current value is too restrictive:
-- Per-table setting in the distributed table definition: ALTER TABLE your_distributed_table MODIFY SETTING bytes_to_throw_insert = 5000000000; -- 5 GBSpeed up the distribution monitor:
<!-- In the distributed table engine settings --> <distributed_directory_monitor_sleep_time_ms>100</distributed_directory_monitor_sleep_time_ms> <distributed_directory_monitor_max_sleep_time_ms>10000</distributed_directory_monitor_max_sleep_time_ms>Lower sleep times cause the monitor to check for pending data more frequently.
Manually flush pending data:
SYSTEM FLUSH DISTRIBUTED your_db.your_distributed_table;This forces an immediate attempt to send all pending data.
If shards are rejecting data, investigate the root cause on the shard side -- check for read-only mode, schema mismatches, or full disks:
-- On the shard node: SELECT * FROM system.replicas WHERE is_readonly = 1;
Best Practices
- Monitor
system.distribution_queueregularly and alert on risingbytes_to_sendvalues before they hit the threshold. - Size the
bytes_to_throw_insertlimit to accommodate expected burst patterns while still protecting against unbounded growth. - Consider inserting directly into local tables on each shard (using a load balancer or client-side routing) for high-throughput workloads, bypassing the distributed table's buffering mechanism entirely.
- Ensure the initiator node has fast disk I/O for the distributed table's data directory, as pending files are written and read from disk.
- Keep shard nodes healthy and responsive -- slow or unavailable shards are the most common root cause of pending data accumulation.
- Use
fsync_after_insertandfsync_directoriessettings appropriately to balance durability against write performance.
Frequently Asked Questions
Q: Where are the pending bytes stored on disk?
A: ClickHouse stores pending data in subdirectories under the distributed table's data path, typically under /var/lib/clickhouse/data/<database>/<distributed_table>/. Each shard has its own subdirectory containing files waiting to be sent.
Q: Will I lose data if I restart ClickHouse while there are pending bytes?
A: No. Pending data is persisted to disk. When ClickHouse restarts, the distribution monitor will resume sending the queued files to their target shards.
Q: What happens if I drop the distributed table while pending data exists?
A: The pending data files will be removed along with the table. Any unsent data will be lost. If you need to preserve the data, flush it first with SYSTEM FLUSH DISTRIBUTED before dropping the table.
Q: Can I set different limits for different distributed tables?
A: Yes. The bytes_to_throw_insert and bytes_to_delay_insert settings can be specified per table in the distributed table engine parameters, allowing different thresholds for different tables.
Q: Is there a way to delay inserts instead of rejecting them outright?
A: Yes. The bytes_to_delay_insert setting introduces an artificial delay on inserts when the pending bytes exceed a certain threshold, applying backpressure before hitting the hard rejection limit. This gives the distribution monitor time to catch up.