ClickHouse DB::Exception: Timeout exceeded

Q: Do timeouts apply to INSERT operations?

Yes. Insert operations can time out due to send_timeout , connection issues, or lock waits if another operation holds an exclusive lock on the target table. Long-running inserts into replicated tables may also time out waiting for ZooKeeper acknowledgment.

The "DB::Exception: Timeout exceeded" error in ClickHouse is a generic timeout indicator that can arise in several contexts: query execution, connection establishment, lock acquisition, or inter-node communication. The TIMEOUT_EXCEEDED error code signals that an operation did not complete within its allotted time, regardless of the specific timeout type.

Impact

The affected operation is terminated, and no result is returned. Depending on the context, this can manifest as failed queries, dropped client connections, stalled replication, or blocked DDL operations. In distributed setups, a timeout on one node can cascade into failures across the entire query if not handled properly.

Common Causes

Query execution exceeding max_execution_time or related time limits
Connection timeouts when ClickHouse cannot reach a remote shard or ZooKeeper within the configured window
Lock wait timeouts when DDL operations or mutations wait for exclusive access to a table
Distributed query timeouts caused by slow or unresponsive shards
ZooKeeper session timeouts in replicated table environments
Network issues causing inter-node communication delays
Heavy server load causing operations to stall beyond their time limits

Troubleshooting and Resolution Steps

Identify the timeout type from the full error message. The context varies significantly:

-- Check timeout-related settings
SELECT name, value FROM system.settings WHERE name LIKE '%timeout%';

For query execution timeouts, increase max_execution_time:
```
SET max_execution_time = 300;
SELECT ...;
```

For connection timeouts, check network connectivity and adjust settings:

-- Key connection timeout settings
SELECT name, value FROM system.settings
WHERE name IN ('connect_timeout', 'receive_timeout', 'send_timeout',
                'connect_timeout_with_failover_ms');

Increase if network latency is high:

SET connect_timeout = 30;
SET receive_timeout = 300;
SET send_timeout = 300;

For distributed query timeouts, check shard health:

-- Check cluster status
SELECT * FROM system.clusters;

-- Check for slow replicas
SELECT * FROM system.replicas WHERE is_session_expired = 1 OR future_parts > 10;

For lock wait timeouts, identify blocking operations:

-- Check active processes that might hold locks
SELECT query_id, query, elapsed FROM system.processes ORDER BY elapsed DESC;

-- Check pending mutations
SELECT * FROM system.mutations WHERE is_done = 0;

For ZooKeeper timeouts, verify ZooKeeper cluster health:

echo ruok | nc zookeeper-host 2181
echo stat | nc zookeeper-host 2181

Monitor network health between ClickHouse nodes and between ClickHouse and ZooKeeper:
```
ping -c 5 remote-clickhouse-node
traceroute remote-clickhouse-node
```

Best Practices

Configure timeout values appropriate for your network topology and workload characteristics.
Use different timeout profiles for interactive queries (short) and batch operations (long).
Monitor ZooKeeper latency and session health proactively, as ZooKeeper issues often manifest as timeouts.
Implement retry logic in client applications for transient timeout errors.
Use max_execution_time_leaf to set per-shard timeouts in distributed queries, preventing a single slow shard from blocking the entire query.
Keep ClickHouse nodes and ZooKeeper in the same network zone to minimize latency.

Frequently Asked Questions

Q: How do I distinguish between different types of timeouts?
A: The full error message and stack trace provide context. Query timeouts typically mention max_execution_time, connection timeouts reference specific host addresses, and lock timeouts mention the table or operation being waited on. Checking system.query_log also helps identify the timeout source.

Q: Can a timeout on one shard cause the entire distributed query to fail?
A: Yes. By default, a distributed query fails if any shard times out. You can use skip_unavailable_shards = 1 to continue with available shards, though results will be incomplete.

Q: Should I set very high timeouts to avoid this error?
A: Not as a general practice. Timeouts exist to prevent runaway operations from consuming resources indefinitely. Instead, investigate why operations are slow and address the root cause, while setting timeouts to reasonable values for your environment.

Q: Do timeouts apply to INSERT operations?
A: Yes. Insert operations can time out due to send_timeout, connection issues, or lock waits if another operation holds an exclusive lock on the target table. Long-running inserts into replicated tables may also time out waiting for ZooKeeper acknowledgment.