ClickHouse System Tables Filling Disk: TTL and Cleanup Guide

Q: Which system log table grows the fastest?

Usually trace_log followed by query_thread_log and asynchronous_metric_log . query_log itself is moderate but dominates if queries are long.

Q: Can I add TTL without restarting ClickHouse?

Yes, ALTER TABLE ... MODIFY TTL is online. New parts get the TTL; existing parts are evaluated on the next merge.

The system database in ClickHouse mixes two very different things. Virtual tables like system.parts and system.columns are computed on demand and consume no disk. The _log tables (query_log, query_thread_log, trace_log, part_log, metric_log, asynchronous_metric_log, and so on) are real MergeTree tables that grow forever by default. On a busy server, they easily reach hundreds of gigabytes. This guide shows how to control them without losing the operational data you need.

Recommended approach: apply TTL to every log table (Options 1 and 2 below). TTL caps disk usage while keeping the full history available for as long as you need it. Disabling tables or throttling per-user logging (Options 3 and 4) should be a last resort — see the warnings on each.

Option 1: Add TTL to Existing Log Tables (Recommended)

The least invasive fix, and the one we recommend by default, is to set a TTL on each log table so old data drops automatically. This works on a running server with no restart:

ALTER TABLE system.query_log MODIFY TTL event_date + INTERVAL 14 DAY;
ALTER TABLE system.query_thread_log MODIFY TTL event_date + INTERVAL 7 DAY;
ALTER TABLE system.trace_log MODIFY TTL event_date + INTERVAL 14 DAY;
ALTER TABLE system.part_log MODIFY TTL event_date + INTERVAL 90 DAY;
ALTER TABLE system.metric_log MODIFY TTL event_date + INTERVAL 14 DAY;
ALTER TABLE system.asynchronous_metric_log MODIFY TTL event_date + INTERVAL 14 DAY;

Because the log tables are already partitioned by month or day, TTL drops whole partitions cheaply rather than rewriting parts.

Option 2: Configure TTL at Server Startup (Recommended)

If you want new servers to have the right retention from day one, configure it in the log table sections of config.xml or a drop-in:

<clickhouse>
    <query_log>
        <database>system</database>
        <table>query_log</table>
        <ttl>event_date + INTERVAL 30 DAY DELETE</ttl>
        <flush_interval_milliseconds>7500</flush_interval_milliseconds>
    </query_log>
</clickhouse>

You can also override the engine to use explicit daily partitioning. Daily partitions let TTL drop whole days:

<query_log>
    <database>system</database>
    <table>query_log</table>
    <engine>ENGINE = MergeTree
            PARTITION BY event_date
            ORDER BY (event_time)
            TTL event_date + INTERVAL 14 DAY DELETE</engine>
</query_log>

Note: in Kubernetes deployments, avoid the <engine> tag because of possible configuration clashes during chart upgrades. Use the <ttl> form instead.

Option 3: Disable Specific Log Tables

Prefer TTL (Options 1 and 2) over disabling. Disabling a log table is permanent until you re-enable and restart, and it removes the underlying signal entirely rather than capping its retention.

Heads up if you run Pulse: Pulse relies on these tables to understand cluster behavior, surface regressions, and auto-detect and remediate issues. Disabling metric_log, asynchronous_metric_log, part_log, trace_log, or query_log will reduce or eliminate Pulse's visibility into the corresponding subsystem. Use TTL to bound disk usage instead — Pulse only needs a recent window of history, not unbounded retention.

If a log table genuinely provides no value for your workload and you accept the loss of observability, remove it. Create /etc/clickhouse-server/config.d/z_log_disable.xml (the z_ prefix ensures it loads last, after vendor configs):

<?xml version="1.0"?>
<clickhouse>
    <asynchronous_metric_log remove="1"/>
    <metric_log remove="1"/>
    <trace_log remove="1"/>
    <part_log remove="1"/>
    <text_log remove="1"/>
    <crash_log remove="1"/>
    <session_log remove="1"/>
</clickhouse>

Keep query_log enabled. It is by far the most useful log for debugging slow queries, memory issues, and unexpected workload patterns.

Option 4: Throttle Query Logging Per User

Rather than disable query_log, reduce its volume per user profile. Throttling is less destructive than disabling but still drops signal — sampled or duration-filtered logging hides the queries that fall below the threshold from any downstream consumer, including Pulse's slow-query and regression detection. Use it when TTL alone cannot keep the table bounded.

Setting	Effect
`log_queries_min_duration_ms`	Skip queries below the threshold
`log_queries_cut_to_length`	Truncate long query texts
`log_queries_probability`	Sample only a fraction of queries
`log_query_threads`	Disable the per-thread log
`log_query_views`	Disable view-access logging
`log_processors_profiles`	Disable processor-level profiling logs

Set these in users.xml for the relevant profile:

<profiles>
    <default>
        <log_queries>1</log_queries>
        <log_queries_min_duration_ms>100</log_queries_min_duration_ms>
        <log_query_threads>0</log_query_threads>
    </default>
</profiles>

Per-user settings take effect immediately, no restart needed.

Drop Renamed Tables from Upgrades

When ClickHouse upgrades introduce a schema change to a log table, the old table is renamed with a numeric suffix (query_log_1, query_log_2, ...) and a new empty one is created. These accumulate over time. If you do not need the historical data, drop them:

SELECT name, formatReadableSize(total_bytes) AS size
FROM system.tables
WHERE database = 'system' AND match(name, '_[0-9]+$')
ORDER BY total_bytes DESC;

DROP TABLE system.query_log_1 SYNC;

Common Pitfalls

Removing query_log entirely. You lose your most useful debugging tool and break any observability layer that depends on it (including Pulse). Use TTL or sampling instead.
Setting TTL to a very short interval like 1 day on a system with deep historical investigations needed.
Forgetting to drop renamed _log_N tables after upgrades. They keep consuming disk forever.
Disabling logs on one node but not the others. Cluster-wide visibility breaks.

Frequently Asked Questions

Q: Which system log table grows the fastest? A: Usually trace_log followed by query_thread_log and asynchronous_metric_log. query_log itself is moderate but dominates if queries are long.

Q: Can I add TTL without restarting ClickHouse? A: Yes, ALTER TABLE ... MODIFY TTL is online. New parts get the TTL; existing parts are evaluated on the next merge.

Q: How long should I keep query_log? A: 14 to 30 days is typical. Long enough to investigate complaints from last week, short enough to bound disk usage.

Q: Will dropping a log table break ClickHouse? A: No. ClickHouse will recreate it on the next flush if the corresponding section in config is still enabled.

Q: Should I move log tables to a slower disk? A: You can, via a storage policy on each log table. It saves fast-disk space at the cost of slower queries against the logs.