NEW

Pulse 2025 Product Roundup: From Monitoring to AI-Native Control Plane

ClickHouse System Tables Filling Disk: TTL and Cleanup Guide

The system database in ClickHouse mixes two very different things. Virtual tables like system.parts and system.columns are computed on demand and consume no disk. The _log tables (query_log, query_thread_log, trace_log, part_log, metric_log, asynchronous_metric_log, and so on) are real MergeTree tables that grow forever by default. On a busy server, they easily reach hundreds of gigabytes. This guide shows how to control them without losing the operational data you need.

Option 1: Add TTL to Existing Log Tables

The least invasive fix is to set a TTL on each log table so old data drops automatically. This works on a running server with no restart:

ALTER TABLE system.query_log MODIFY TTL event_date + INTERVAL 14 DAY;
ALTER TABLE system.query_thread_log MODIFY TTL event_date + INTERVAL 7 DAY;
ALTER TABLE system.trace_log MODIFY TTL event_date + INTERVAL 14 DAY;
ALTER TABLE system.part_log MODIFY TTL event_date + INTERVAL 90 DAY;
ALTER TABLE system.metric_log MODIFY TTL event_date + INTERVAL 14 DAY;
ALTER TABLE system.asynchronous_metric_log MODIFY TTL event_date + INTERVAL 14 DAY;

Because the log tables are already partitioned by month or day, TTL drops whole partitions cheaply rather than rewriting parts.

Option 2: Configure TTL at Server Startup

If you want new servers to have the right retention from day one, configure it in the log table sections of config.xml or a drop-in:

<clickhouse>
    <query_log>
        <database>system</database>
        <table>query_log</table>
        <ttl>event_date + INTERVAL 30 DAY DELETE</ttl>
        <flush_interval_milliseconds>7500</flush_interval_milliseconds>
    </query_log>
</clickhouse>

You can also override the engine to use explicit daily partitioning. Daily partitions let TTL drop whole days:

<query_log>
    <database>system</database>
    <table>query_log</table>
    <engine>ENGINE = MergeTree
            PARTITION BY event_date
            ORDER BY (event_time)
            TTL event_date + INTERVAL 14 DAY DELETE</engine>
</query_log>

Note: in Kubernetes deployments, avoid the <engine> tag because of possible configuration clashes during chart upgrades. Use the <ttl> form instead.

Option 3: Disable Specific Log Tables

If a log table provides no value for your workload, remove it. Create /etc/clickhouse-server/config.d/z_log_disable.xml (the z_ prefix ensures it loads last, after vendor configs):

<?xml version="1.0"?>
<clickhouse>
    <asynchronous_metric_log remove="1"/>
    <metric_log remove="1"/>
    <trace_log remove="1"/>
    <part_log remove="1"/>
    <text_log remove="1"/>
    <crash_log remove="1"/>
    <session_log remove="1"/>
</clickhouse>

Keep query_log enabled. It is by far the most useful log for debugging slow queries, memory issues, and unexpected workload patterns.

Option 4: Throttle Query Logging Per User

Rather than disable query_log, reduce its volume per user profile:

Setting Effect
log_queries_min_duration_ms Skip queries below the threshold
log_queries_cut_to_length Truncate long query texts
log_queries_probability Sample only a fraction of queries
log_query_threads Disable the per-thread log
log_query_views Disable view-access logging
log_processors_profiles Disable processor-level profiling logs

Set these in users.xml for the relevant profile:

<profiles>
    <default>
        <log_queries>1</log_queries>
        <log_queries_min_duration_ms>100</log_queries_min_duration_ms>
        <log_query_threads>0</log_query_threads>
    </default>
</profiles>

Per-user settings take effect immediately, no restart needed.

Drop Renamed Tables from Upgrades

When ClickHouse upgrades introduce a schema change to a log table, the old table is renamed with a numeric suffix (query_log_1, query_log_2, ...) and a new empty one is created. These accumulate over time. If you do not need the historical data, drop them:

SELECT name, formatReadableSize(total_bytes) AS size
FROM system.tables
WHERE database = 'system' AND match(name, '_[0-9]+$')
ORDER BY total_bytes DESC;

DROP TABLE system.query_log_1 SYNC;

Common Pitfalls

  • Removing query_log entirely. You lose your most useful debugging tool. Use TTL or sampling instead.
  • Setting TTL to a very short interval like 1 day on a system with deep historical investigations needed.
  • Forgetting to drop renamed _log_N tables after upgrades. They keep consuming disk forever.
  • Disabling logs on one node but not the others. Cluster-wide visibility breaks.

Frequently Asked Questions

Q: Which system log table grows the fastest? A: Usually trace_log followed by query_thread_log and asynchronous_metric_log. query_log itself is moderate but dominates if queries are long.

Q: Can I add TTL without restarting ClickHouse? A: Yes, ALTER TABLE ... MODIFY TTL is online. New parts get the TTL; existing parts are evaluated on the next merge.

Q: How long should I keep query_log? A: 14 to 30 days is typical. Long enough to investigate complaints from last week, short enough to bound disk usage.

Q: Will dropping a log table break ClickHouse? A: No. ClickHouse will recreate it on the next flush if the corresponding section in config is still enabled.

Q: Should I move log tables to a slower disk? A: You can, via a storage policy on each log table. It saves fast-disk space at the cost of slower queries against the logs.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.