ClickHouse Database, Table, and Column Size SQL Queries

Q: How do I find the largest table in a ClickHouse server?

Run the table size query and sort by size DESC with LIMIT 10 . The first row is the largest active table on that server.

Q: How do I see size per partition?

Add partition to the GROUP BY and SELECT clauses, then filter on the table of interest. system.parts exposes the partition name directly.

Q: How do I get total cluster size, not per-node?

Wrap the inner SELECT in clusterAllReplicas('cluster_name', ...) . Be aware that ReplicatedMergeTree tables will be counted once per replica, so divide by the replication factor for the logical total.

Capacity planning and storage forecasting in ClickHouse rely on a handful of well-known queries against system.parts, system.parts_columns, and system.projection_parts. These tables expose compressed bytes, uncompressed bytes, row counts, and the number of active parts for every table on the server. Combining them with formatReadableSize gives operators a quick way to identify the largest tables, the columns driving storage cost, and the compression ratios that confirm encoding choices are paying off.

Keep these queries in your DBA toolkit and run them whenever a disk usage alert fires or before you plan a hardware upgrade.

Table Size, Compression, and Part Count

The first query reports compressed and uncompressed size, compression ratio, row count, and part count per table. Filter by database or table with the LIKE patterns.

SELECT
    database,
    table,
    formatReadableSize(sum(data_compressed_bytes) AS size) AS compressed,
    formatReadableSize(sum(data_uncompressed_bytes) AS usize) AS uncompressed,
    round(usize / size, 2) AS compr_rate,
    sum(rows) AS rows,
    count() AS part_count
FROM system.parts
WHERE (active = 1) AND (database LIKE '%') AND (table LIKE '%')
GROUP BY database, table
ORDER BY size DESC;

Use this query as the entry point for any storage investigation. The part_count column flags tables with merge backlogs, and compr_rate (uncompressed divided by compressed) shows how effective the codec configuration is.

Column Size and Compression Ratio

When one table dominates storage, drill into its columns. system.parts_columns gives per-column byte counts and the average row size:

SELECT
    database,
    table,
    column,
    formatReadableSize(sum(column_data_compressed_bytes) AS size) AS compressed,
    formatReadableSize(sum(column_data_uncompressed_bytes) AS usize) AS uncompressed,
    round(usize / size, 2) AS compr_ratio,
    sum(rows) AS rows_cnt,
    round(usize / rows_cnt, 2) AS avg_row_size
FROM system.parts_columns
WHERE (active = 1) AND (database LIKE '%') AND (table LIKE '%')
GROUP BY database, table, column
ORDER BY size DESC;

Columns with poor compression ratios (close to 1.0) are candidates for codec tuning, ordering key adjustments, or different data types. High avg_row_size on a wide String column often points to JSON or free-text payloads that compress better with CODEC(ZSTD) than the default LZ4.

Projection Size

Projections store additional physical layouts of the same data and add to total storage. Query system.projection_parts to see how much space each projection consumes:

SELECT
    database,
    table,
    name,
    formatReadableSize(sum(data_compressed_bytes) AS size) AS compressed,
    formatReadableSize(sum(data_uncompressed_bytes) AS usize) AS uncompressed,
    round(usize / size, 2) AS compr_rate,
    sum(rows) AS rows,
    count() AS part_count
FROM system.projection_parts
WHERE (table = 'ptest') AND active
GROUP BY database, table, name
ORDER BY size DESC;

Replace 'ptest' with the table you want to inspect. The name column is the projection name as defined in the table DDL.

Database Totals

Aggregating up to the database level helps with quota planning and chargeback:

SELECT
    database,
    formatReadableSize(sum(data_compressed_bytes)) AS compressed,
    formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed,
    sum(rows) AS rows,
    count(DISTINCT table) AS tables,
    count() AS parts
FROM system.parts
WHERE active
GROUP BY database
ORDER BY sum(data_compressed_bytes) DESC;

Interpreting the Numbers

Metric	Meaning
`compressed`	Bytes on disk for active parts
`uncompressed`	Original byte size before codecs
`compr_rate` / `compr_ratio`	Uncompressed divided by compressed, higher is better
`part_count`	Active data parts, watch for thousands per partition
`avg_row_size`	Uncompressed bytes per row for that column

A healthy ClickHouse table typically shows compr_rate above 3 for analytics workloads. Time-series workloads with sorted timestamps and low-cardinality dimensions often exceed 10.

Common Pitfalls

system.parts and system.parts_columns include both active and inactive parts. Always filter active = 1 or you double-count parts that are about to be removed after a merge.
These queries run on a single node. For a cluster-wide total, wrap them in clusterAllReplicas and aggregate across hosts.
Compressed bytes do not include the marks files or primary index, so on-disk usage from du will be slightly larger than the query reports.
formatReadableSize returns a string, which is convenient for humans but not sortable. Sort by the raw byte total (the inner alias size works because it is the sum(...) value).
Projection size is reported separately from the parent table. Add system.projection_parts to capacity reports if your tables use projections.

Frequently Asked Questions

Q: How do I find the largest table in a ClickHouse server? A: Run the table size query and sort by size DESC with LIMIT 10. The first row is the largest active table on that server.

Q: Why is compressed size larger than expected? A: Possible causes include too many small parts (overhead per part), poor codec choice for the data type, or unsorted data that defeats run-length and delta codecs. Check part_count first.

Q: How do I see size per partition? A: Add partition to the GROUP BY and SELECT clauses, then filter on the table of interest. system.parts exposes the partition name directly.

Q: Do these queries include data still in memory? A: No. Buffer table and pending insert blocks are not counted. They only become visible to system.parts after the next flush to disk.

Q: How do I get total cluster size, not per-node? A: Wrap the inner SELECT in clusterAllReplicas('cluster_name', ...). Be aware that ReplicatedMergeTree tables will be counted once per replica, so divide by the replication factor for the logical total.