ClickHouse DB::Exception: Cannot read all data from source

Q: Can I make ClickHouse automatically retry failed reads?

For S3 and HTTP sources, ClickHouse has retry settings ( s3_retry_attempts , http_max_tries ). For local file reads, there is no automatic retry -- the read either succeeds or fails.

The "DB::Exception: Cannot read all data" error in ClickHouse means that a read operation returned fewer bytes than expected. The CANNOT_READ_ALL_DATA error code covers a broad range of situations: reading from local files, remote HTTP endpoints, S3 objects, or inter-node communication. At its core, ClickHouse requested N bytes and only received fewer than N before the source signaled completion or failure.

Impact

This error causes the current operation to fail:

INSERT operations reading from external sources are aborted, and no data is committed.
SELECT queries against remote tables or external files return an error.
In distributed queries, a partial read from one shard can cause the entire query to fail.
Backup and restore operations may be interrupted if underlying files are inaccessible.

Common Causes

Network interruption -- a TCP connection to a remote source (another ClickHouse node, HTTP endpoint, S3) drops mid-transfer.
Corrupted or truncated file -- a local file on disk is shorter than its metadata claims, due to incomplete writes or disk errors.
Timeout -- a slow source triggers a read timeout before all data has been transferred, particularly common with cross-region or high-latency connections.
Remote server error -- the upstream HTTP server or S3 returns an error mid-stream, terminating the transfer prematurely.
Disk I/O error -- a hardware failure or filesystem issue prevents ClickHouse from reading the full contents of a data file.
Resource limits -- operating system limits (e.g., file descriptor exhaustion, memory pressure) cause read operations to fail partially.

Troubleshooting and Resolution Steps

Check if the error is transient. Retry the operation once. Network hiccups often resolve on their own:

-- Simply re-run the failing query
INSERT INTO my_table SELECT * FROM url('https://example.com/data.csv', CSV, 'a Int32, b String');

Verify the source file integrity. For local files:

ls -la /path/to/data.csv
md5sum /path/to/data.csv

Check disk health on the ClickHouse server:
```
dmesg | grep -i "error\|fault\|i/o"
```

Increase timeout settings if the source is slow:

SET receive_timeout = 600;
SET send_timeout = 600;
SET http_receive_timeout = 600;
SET http_send_timeout = 600;

Check network connectivity to remote sources:

curl -v -o /dev/null https://remote-source.example.com/data.csv

Examine ClickHouse server logs for more context. The system log often contains the underlying OS-level error:

SELECT event_time, message
FROM system.text_log
WHERE message LIKE '%Cannot read all data%'
ORDER BY event_time DESC
LIMIT 10;

For S3-related failures, enable retries:

SET s3_max_unexpected_write_error_retries = 5;
SET s3_retry_attempts = 10;

Best Practices

Implement retry logic for all operations that read from external sources, as transient network failures are inevitable.
Monitor disk health proactively using SMART monitoring and filesystem checks.
Set generous but not infinite timeouts for remote reads, and tune them based on observed latency patterns.
Use ClickHouse's built-in S3 retry settings when reading from object storage.
For critical data pipelines, stage files locally on the ClickHouse server before importing to eliminate network variables during the actual INSERT.

Frequently Asked Questions

Q: Is this error always caused by network issues?
A: No. While network interruptions are the most common cause, disk I/O errors, corrupted files, and resource exhaustion can also trigger CANNOT_READ_ALL_DATA. Check the server logs for the underlying OS-level error to determine the root cause.

Q: Can I make ClickHouse automatically retry failed reads?
A: For S3 and HTTP sources, ClickHouse has retry settings (s3_retry_attempts, http_max_tries). For local file reads, there is no automatic retry -- the read either succeeds or fails.

Q: This error happens intermittently when querying distributed tables. What should I investigate?
A: Intermittent failures on distributed queries typically indicate network instability between cluster nodes. Check network hardware, switches, and firewall rules. Also verify that no nodes are overloaded to the point where they drop connections.

Q: How does this differ from UNEXPECTED_END_OF_FILE?
A: UNEXPECTED_END_OF_FILE is a format-level error -- the parser expected more structured data. CANNOT_READ_ALL_DATA is a transport-level error -- the underlying I/O operation failed to deliver the requested number of bytes, regardless of format.