ClickHouse System Tables Filling Disk: TTL and Cleanup Guide

The system database in ClickHouse mixes two very different things. Virtual tables like system.parts and system.columns are computed on demand and consume no disk. The _log tables (query_log, query_thread_log, trace_log, part_log, metric_log, asynchronous_metric_log, and so on) are real MergeTree tables that grow forever by default. On a busy server, they easily reach hundreds of gigabytes. This guide shows how to control them without losing the operational data you need.

Recommended approach: apply TTL to every log table (Options 1 and 2 below). TTL caps disk usage while keeping the full history available for as long as you need it. Disabling tables or throttling per-user logging (Options 3 and 4) should be a last resort — see the warnings on each.

The least invasive fix, and the one we recommend by default, is to set a TTL on each log table so old data drops automatically. This works on a running server with no restart:

ALTER TABLE system.query_log MODIFY TTL event_date + INTERVAL 14 DAY;
ALTER TABLE system.query_thread_log MODIFY TTL event_date + INTERVAL 7 DAY;
ALTER TABLE system.trace_log MODIFY TTL event_date + INTERVAL 14 DAY;
ALTER TABLE system.part_log MODIFY TTL event_date + INTERVAL 90 DAY;
ALTER TABLE system.metric_log MODIFY TTL event_date + INTERVAL 14 DAY;
ALTER TABLE system.asynchronous_metric_log MODIFY TTL event_date + INTERVAL 14 DAY;

Because the log tables are already partitioned by month or day, TTL drops whole partitions cheaply rather than rewriting parts.

If you want new servers to have the right retention from day one, configure it in the log table sections of config.xml or a drop-in:

<clickhouse>
    <query_log>
        <database>system</database>
        <table>query_log</table>
        <ttl>event_date + INTERVAL 30 DAY DELETE</ttl>
        <flush_interval_milliseconds>7500</flush_interval_milliseconds>
    </query_log>
</clickhouse>

You can also override the engine to use explicit daily partitioning. Daily partitions let TTL drop whole days:

<query_log>
    <database>system</database>
    <table>query_log</table>
    <engine>ENGINE = MergeTree
            PARTITION BY event_date
            ORDER BY (event_time)
            TTL event_date + INTERVAL 14 DAY DELETE</engine>
</query_log>

Note: in Kubernetes deployments, avoid the <engine> tag because of possible configuration clashes during chart upgrades. Use the <ttl> form instead.

Option 3: Disable Specific Log Tables

Prefer TTL (Options 1 and 2) over disabling. Disabling a log table is permanent until you re-enable and restart, and it removes the underlying signal entirely rather than capping its retention.

Heads up if you run Pulse: Pulse relies on these tables to understand cluster behavior, surface regressions, and auto-detect and remediate issues. Disabling metric_log, asynchronous_metric_log, part_log, trace_log, or query_log will reduce or eliminate Pulse's visibility into the corresponding subsystem. Use TTL to bound disk usage instead — Pulse only needs a recent window of history, not unbounded retention.

If a log table genuinely provides no value for your workload and you accept the loss of observability, remove it. Create /etc/clickhouse-server/config.d/z_log_disable.xml (the z_ prefix ensures it loads last, after vendor configs):

<?xml version="1.0"?>
<clickhouse>
    <asynchronous_metric_log remove="1"/>
    <metric_log remove="1"/>
    <trace_log remove="1"/>
    <part_log remove="1"/>
    <text_log remove="1"/>
    <crash_log remove="1"/>
    <session_log remove="1"/>
</clickhouse>

Keep query_log enabled. It is by far the most useful log for debugging slow queries, memory issues, and unexpected workload patterns.

Option 4: Throttle Query Logging Per User

Rather than disable query_log, reduce its volume per user profile. Throttling is less destructive than disabling but still drops signal — sampled or duration-filtered logging hides the queries that fall below the threshold from any downstream consumer, including Pulse's slow-query and regression detection. Use it when TTL alone cannot keep the table bounded.

Setting Effect
log_queries_min_duration_ms Skip queries below the threshold
log_queries_cut_to_length Truncate long query texts
log_queries_probability Sample only a fraction of queries
log_query_threads Disable the per-thread log
log_query_views Disable view-access logging
log_processors_profiles Disable processor-level profiling logs

Set these in users.xml for the relevant profile:

<profiles>
    <default>
        <log_queries>1</log_queries>
        <log_queries_min_duration_ms>100</log_queries_min_duration_ms>
        <log_query_threads>0</log_query_threads>
    </default>
</profiles>

Per-user settings take effect immediately, no restart needed.

Drop Renamed Tables from Upgrades

When ClickHouse upgrades introduce a schema change to a log table, the old table is renamed with a numeric suffix (query_log_1, query_log_2, ...) and a new empty one is created. These accumulate over time. If you do not need the historical data, drop them:

SELECT name, formatReadableSize(total_bytes) AS size
FROM system.tables
WHERE database = 'system' AND match(name, '_[0-9]+$')
ORDER BY total_bytes DESC;

DROP TABLE system.query_log_1 SYNC;

Common Pitfalls

  • Removing query_log entirely. You lose your most useful debugging tool and break any observability layer that depends on it (including Pulse). Use TTL or sampling instead.
  • Setting TTL to a very short interval like 1 day on a system with deep historical investigations needed.
  • Forgetting to drop renamed _log_N tables after upgrades. They keep consuming disk forever.
  • Disabling logs on one node but not the others. Cluster-wide visibility breaks.

Frequently Asked Questions

Q: Which system log table grows the fastest? A: Usually trace_log followed by query_thread_log and asynchronous_metric_log. query_log itself is moderate but dominates if queries are long.

Q: Can I add TTL without restarting ClickHouse? A: Yes, ALTER TABLE ... MODIFY TTL is online. New parts get the TTL; existing parts are evaluated on the next merge.

Q: How long should I keep query_log? A: 14 to 30 days is typical. Long enough to investigate complaints from last week, short enough to bound disk usage.

Q: Will dropping a log table break ClickHouse? A: No. ClickHouse will recreate it on the next flush if the corresponding section in config is still enabled.

Q: Should I move log tables to a slower disk? A: You can, via a storage policy on each log table. It saves fast-disk space at the cost of slower queries against the logs.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.