NEW

Pulse 2025 Product Roundup: From Monitoring to AI-Native Control Plane

ClickHouse DB::Exception: Timeout exceeded

The "DB::Exception: Timeout exceeded" error in ClickHouse is a generic timeout indicator that can arise in several contexts: query execution, connection establishment, lock acquisition, or inter-node communication. The TIMEOUT_EXCEEDED error code signals that an operation did not complete within its allotted time, regardless of the specific timeout type.

Impact

The affected operation is terminated, and no result is returned. Depending on the context, this can manifest as failed queries, dropped client connections, stalled replication, or blocked DDL operations. In distributed setups, a timeout on one node can cascade into failures across the entire query if not handled properly.

Common Causes

  1. Query execution exceeding max_execution_time or related time limits
  2. Connection timeouts when ClickHouse cannot reach a remote shard or ZooKeeper within the configured window
  3. Lock wait timeouts when DDL operations or mutations wait for exclusive access to a table
  4. Distributed query timeouts caused by slow or unresponsive shards
  5. ZooKeeper session timeouts in replicated table environments
  6. Network issues causing inter-node communication delays
  7. Heavy server load causing operations to stall beyond their time limits

Troubleshooting and Resolution Steps

  1. Identify the timeout type from the full error message. The context varies significantly:

    -- Check timeout-related settings
    SELECT name, value FROM system.settings WHERE name LIKE '%timeout%';
    
  2. For query execution timeouts, increase max_execution_time:

    SET max_execution_time = 300;
    SELECT ...;
    
  3. For connection timeouts, check network connectivity and adjust settings:

    -- Key connection timeout settings
    SELECT name, value FROM system.settings
    WHERE name IN ('connect_timeout', 'receive_timeout', 'send_timeout',
                    'connect_timeout_with_failover_ms');
    

    Increase if network latency is high:

    SET connect_timeout = 30;
    SET receive_timeout = 300;
    SET send_timeout = 300;
    
  4. For distributed query timeouts, check shard health:

    -- Check cluster status
    SELECT * FROM system.clusters;
    
    -- Check for slow replicas
    SELECT * FROM system.replicas WHERE is_session_expired = 1 OR future_parts > 10;
    
  5. For lock wait timeouts, identify blocking operations:

    -- Check active processes that might hold locks
    SELECT query_id, query, elapsed FROM system.processes ORDER BY elapsed DESC;
    
    -- Check pending mutations
    SELECT * FROM system.mutations WHERE is_done = 0;
    
  6. For ZooKeeper timeouts, verify ZooKeeper cluster health:

    echo ruok | nc zookeeper-host 2181
    echo stat | nc zookeeper-host 2181
    
  7. Monitor network health between ClickHouse nodes and between ClickHouse and ZooKeeper:

    ping -c 5 remote-clickhouse-node
    traceroute remote-clickhouse-node
    

Best Practices

  • Configure timeout values appropriate for your network topology and workload characteristics.
  • Use different timeout profiles for interactive queries (short) and batch operations (long).
  • Monitor ZooKeeper latency and session health proactively, as ZooKeeper issues often manifest as timeouts.
  • Implement retry logic in client applications for transient timeout errors.
  • Use max_execution_time_leaf to set per-shard timeouts in distributed queries, preventing a single slow shard from blocking the entire query.
  • Keep ClickHouse nodes and ZooKeeper in the same network zone to minimize latency.

Frequently Asked Questions

Q: How do I distinguish between different types of timeouts?
A: The full error message and stack trace provide context. Query timeouts typically mention max_execution_time, connection timeouts reference specific host addresses, and lock timeouts mention the table or operation being waited on. Checking system.query_log also helps identify the timeout source.

Q: Can a timeout on one shard cause the entire distributed query to fail?
A: Yes. By default, a distributed query fails if any shard times out. You can use skip_unavailable_shards = 1 to continue with available shards, though results will be incomplete.

Q: Should I set very high timeouts to avoid this error?
A: Not as a general practice. Timeouts exist to prevent runaway operations from consuming resources indefinitely. Instead, investigate why operations are slow and address the root cause, while setting timeouts to reasonable values for your environment.

Q: Do timeouts apply to INSERT operations?
A: Yes. Insert operations can time out due to send_timeout, connection issues, or lock waits if another operation holds an exclusive lock on the target table. Long-running inserts into replicated tables may also time out waiting for ZooKeeper acknowledgment.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.