ClickHouse sysall: Cluster-Wide System Tables View

Q: Why use `clusterAllReplicas` instead of `cluster`?

cluster queries only one replica per shard. clusterAllReplicas hits every replica, which is what you want when each replica might have different data (queue depth, in-flight queries, disk state).

Q: How do I avoid hardcoding the cluster name?

Use the {cluster} macro substitution. The same DDL then works on dev, staging, and prod.

A common operational gap with ClickHouse clusters is that each node has its own system database. To check disk space, running queries, or merges across the fleet, you have to connect to each node individually. The sysall database pattern solves this by wrapping each interesting system table in a view backed by clusterAllReplicas, giving you a single place to ask cluster-wide questions.

Prerequisites

You need a defined cluster in your config (under <remote_servers>) and a cluster macro on every node:

<?xml version="1.0" ?>
<clickhouse>
    <macros>
        <cluster>production</cluster>
        <replica>node-1</replica>
        <shard>1</shard>
    </macros>
</clickhouse>

The macro lets the views reference {cluster} without hardcoding the cluster name, so the same DDL works on every node.

The clusterAllReplicas Pattern

clusterAllReplicas runs a query against every node listed in the cluster, including all replicas of every shard. It returns the union of rows. The base building block is:

SELECT
    hostName(),
    FQDN(),
    materialize(uptime()) AS uptime
FROM clusterAllReplicas('{cluster}', system.one)
SETTINGS skip_unavailable_shards = 1;

skip_unavailable_shards = 1 keeps the query from failing when a node is down, which is exactly what you want for monitoring.

Bootstrap the sysall Database

Create the database and a view that surfaces the up/down state of every node:

CREATE DATABASE sysall;

CREATE OR REPLACE VIEW sysall.cluster_state AS
SELECT
    shard_num,
    replica_num,
    host_name,
    host_address,
    port,
    errors_count,
    uptime,
    if(uptime > 0, 'UP', 'DOWN') AS node_state
FROM system.clusters
LEFT JOIN
(
    SELECT
        hostName() AS host_name,
        FQDN() AS fqdn,
        materialize(uptime()) AS uptime
    FROM clusterAllReplicas('{cluster}', system.one)
) AS hosts_info USING (host_name)
WHERE cluster = getMacro('cluster')
SETTINGS skip_unavailable_shards = 1;

The view joins the static system.clusters topology against live uptime from each replica. Rows where uptime is null show up as DOWN.

Wrap More System Tables

The same pattern works for any system table. Add an fqdn column so you know which node a row came from:

CREATE OR REPLACE VIEW sysall.disks AS
SELECT FQDN() AS nodeFQDN, *
FROM clusterAllReplicas('{cluster}', system.disks)
SETTINGS skip_unavailable_shards = 1;

CREATE OR REPLACE VIEW sysall.parts AS
SELECT FQDN() AS nodeFQDN, *
FROM clusterAllReplicas('{cluster}', system.parts)
SETTINGS skip_unavailable_shards = 1;

CREATE OR REPLACE VIEW sysall.replication_queue AS
SELECT FQDN() AS nodeFQDN, *
FROM clusterAllReplicas('{cluster}', system.replication_queue)
SETTINGS skip_unavailable_shards = 1;

You can script this for every table in system by reading system.tables WHERE database = 'system' and emitting a CREATE VIEW per row.

Example Queries

Check which nodes are up:

SELECT * FROM sysall.cluster_state;

Disk space across the fleet:

SELECT
    nodeFQDN,
    path,
    formatReadableSize(free_space) AS free,
    formatReadableSize(total_space) AS total
FROM sysall.disks
ORDER BY free_space ASC;

Find replicas with growing replication queues:

SELECT nodeFQDN, database, table, count() AS queue_size
FROM sysall.replication_queue
GROUP BY nodeFQDN, database, table
HAVING queue_size > 100;

Common Pitfalls

The cluster macro must be set on every node, otherwise getMacro('cluster') errors out.
Without skip_unavailable_shards = 1, any down node makes the query fail.
Some system tables (like system.numbers) are infinite or expensive. Do not blindly wrap every system table without a LIMIT.
The views run with the privileges of the calling user. Grant SELECT on the underlying system tables on every node, or use a dedicated user with a shared password.

Frequently Asked Questions

Q: Why use clusterAllReplicas instead of cluster? A: cluster queries only one replica per shard. clusterAllReplicas hits every replica, which is what you want when each replica might have different data (queue depth, in-flight queries, disk state).

Q: How do I avoid hardcoding the cluster name? A: Use the {cluster} macro substitution. The same DDL then works on dev, staging, and prod.

Q: Does sysall add load to every node? A: Each query fans out to every replica, so yes. Use it for ad-hoc operations and dashboards with reasonable refresh rates, not for sub-second metrics.

Q: Can I use sysall for INSERT or ALTER? A: No, it is read-only by design. For DDL across a cluster, use ON CLUSTER clauses.

Q: What happens when nodes have different ClickHouse versions? A: Differences in system table schemas can break the views. Keep cluster nodes on the same minor version when possible.