A production ClickHouse cluster is multiple ClickHouse servers coordinated through ClickHouse Keeper (or ZooKeeper), with each server holding a shard, a replica, or both. The configuration is mostly mechanical once the topology is decided. The decisions that matter happen before the first XML file: how many shards, how many replicas, where the Keeper quorum runs, and what the hardware budget per node is.
Hardware Requirements
A workable minimum for a single ClickHouse node:
| Resource | Minimum | Production target |
|---|---|---|
| CPU cores | 4 (with SSE4.2) | 16+ |
| RAM | 16 GB | 64 GB+ |
| Storage | 1 TB HDD | NVMe SSD or fast EBS |
| Network | 1 Gbit | 10 Gbit+ |
SSE4.2 is required at the CPU instruction set level. RAM below 4 GB is unsupported. In cloud environments, avoid burstable disks (gp2 below a few hundred GB, st1, low-IOPS magnetic). Disk throughput matters more than IOPS for ClickHouse because reads scan large compressed parts. Test the baseline performance of network, RAM, and disk before deployment so issues surface before the cluster is loaded.
ClickHouse Keeper (or ZooKeeper) runs on separate hosts from ClickHouse. Production Keeper specs:
- Fast NVMe storage, 128 GB minimum
- Modern multi-core CPU
- 4 GB RAM
- 3000+ baseline IOPS in cloud environments
- Three-node quorum for production (one node is acceptable only in development)
Running Keeper on the same hosts as ClickHouse is possible for small deployments but not recommended at scale. Keeper is latency-sensitive and contends with ClickHouse for disk I/O when colocated.
Network and Ports
The cluster network should be 10 Gbit or higher. Inter-node traffic during replication, distributed queries, and large fetches can saturate slower networks and is a frequent cause of replication lag.
| Port | Role |
|---|---|
| 8123 | ClickHouse HTTP interface (client) |
| 9000 | ClickHouse native TCP (client and inter-server) |
| 9009 | ClickHouse inter-replica HTTP (data transfer for replication) |
| 9181 | ClickHouse Keeper client port |
| 9234 | ClickHouse Keeper raft port |
| 2181 | ZooKeeper client port |
| 2888 | ZooKeeper peer communication |
| 3888 | ZooKeeper leader election |
Firewall rules need to allow ClickHouse nodes to reach each other on 9000 and 9009, and to reach the Keeper quorum on 9181 (or ZooKeeper on 2181). Keeper or ZooKeeper nodes need full mesh connectivity for their consensus ports.
Spread replicas across failure domains: different racks and power circuits in physical environments, different availability zones in cloud environments. Enable TLS on client and inter-server traffic for non-trusted networks. In trusted networks, plaintext inter-server traffic is acceptable and reduces CPU overhead.
Hostnames should be static, DNS-resolvable across the cluster, and follow a schema that encodes datacenter, cluster, shard, and replica (for example ch-prod-dc1-s1r1).
Configuration Files
ClickHouse reads /etc/clickhouse-server/config.xml and merges any files in /etc/clickhouse-server/config.d/. Production configuration should never edit config.xml directly. Put cluster-specific overrides in config.d/ so package upgrades that touch the base config do not collide with your settings.
Keeper or ZooKeeper Config
Create /etc/clickhouse-server/config.d/zookeeper.xml on every ClickHouse node:
<clickhouse>
<zookeeper>
<node>
<host>keeper1.example.com</host>
<port>9181</port>
</node>
<node>
<host>keeper2.example.com</host>
<port>9181</port>
</node>
<node>
<host>keeper3.example.com</host>
<port>9181</port>
</node>
</zookeeper>
</clickhouse>
Use port 2181 if connecting to actual ZooKeeper; port 9181 is the ClickHouse Keeper default.
Macros
Each node has unique identifiers in /etc/clickhouse-server/config.d/macros.xml. These get substituted into table definitions later.
<clickhouse>
<macros>
<cluster>prod_cluster</cluster>
<shard>01</shard>
<replica>ch-prod-s1r1</replica>
</macros>
</clickhouse>
shard is unique per shard and shared across replicas of the same shard. replica is unique per node. cluster is the same on all nodes that participate in this cluster.
Cluster Topology
Create /etc/clickhouse-server/config.d/cluster.xml on every node:
<clickhouse>
<remote_servers>
<prod_cluster>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>ch-prod-s1r1.example.com</host>
<port>9000</port>
</replica>
<replica>
<host>ch-prod-s1r2.example.com</host>
<port>9000</port>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>ch-prod-s2r1.example.com</host>
<port>9000</port>
</replica>
<replica>
<host>ch-prod-s2r2.example.com</host>
<port>9000</port>
</replica>
</shard>
</prod_cluster>
</remote_servers>
</clickhouse>
internal_replication=true means inserts via the Distributed table engine go to exactly one replica per shard and replication is handled by ReplicatedMergeTree. Without this setting, the Distributed engine writes to every replica directly, which duplicates writes and bypasses Keeper.
Many deployments also define helper cluster topologies:
all_replicated: one shard containing every node as a replica, useful for cluster-wide DDLall_sharded: every node as a separate shard with no replication, used for fan-out queries
Replicated Tables
With macros and remote_servers in place, table creation uses substitutions:
CREATE TABLE events_local ON CLUSTER '{cluster}'
(
event_time DateTime,
user_id UInt64,
event_type LowCardinality(String),
payload String
)
ENGINE = ReplicatedMergeTree(
'/clickhouse/tables/{database}/{table}/{shard}',
'{replica}'
)
PARTITION BY toYYYYMM(event_time)
ORDER BY (event_type, event_time, user_id);
ON CLUSTER '{cluster}' runs the DDL on every node. The first argument to ReplicatedMergeTree is the path in Keeper that identifies this table; replicas of the same shard share that path. The second argument is the replica name, which must be unique within the path.
Distributed Table
A Distributed table sits on top of the local replicated tables and routes inserts and queries across shards:
CREATE TABLE events ON CLUSTER '{cluster}'
AS events_local
ENGINE = Distributed('{cluster}', currentDatabase(), events_local, sipHash64(user_id));
The fourth argument is the sharding key. sipHash64(user_id) distributes inserts evenly across shards while keeping a given user's events on a single shard, which helps query locality.
Substitutions and Secrets
For values shared across configs, ClickHouse supports substitutions via /etc/metrika.xml or other included files. This is the right place for cluster topology and Keeper node lists that all nodes share, to avoid duplicating definitions across the config.d/ files.
Credentials and TLS keys do not belong in plaintext XML. Use environment variable substitutions (from_env="...") or external secret managers integrated through your configuration management tool.
Configuration Management
XML files on every node are an invitation to drift. Manage them through Ansible, Puppet, Chef, or the ClickHouse Operator on Kubernetes. The Operator generates these configs from a higher-level ClickHouseInstallation resource and handles rolling restarts when configs change.
Common Pitfalls
- Running ZooKeeper or Keeper on the same disks as ClickHouse data. Keeper is latency-sensitive and gets starved.
- Single Keeper node in production. A failure takes the whole replication system offline.
- Setting
internal_replication=falseon a cluster usingReplicatedMergeTree. Inserts double up and Keeper bookkeeping becomes inconsistent. - Editing
config.xmldirectly instead of usingconfig.d/. Package upgrades will overwrite or conflict. - Replicas in the same rack or availability zone. A single failure takes out the whole shard.
- Skipping the 10 Gbit network and discovering replication lag under load.
Frequently Asked Questions
Q: How many ZooKeeper or Keeper nodes do I need? A: Three for production. One is acceptable only for development. Five only at very large scale where the additional fault tolerance is required.
Q: Should I use ClickHouse Keeper or ZooKeeper? A: Keeper for new deployments. It is protocol-compatible for ClickHouse use and is maintained alongside ClickHouse itself.
Q: Can I run Keeper on the same nodes as ClickHouse? A: For small clusters, yes, on separate disks. For production, dedicated Keeper hosts are recommended because Keeper latency affects every replicated write.
Q: What does internal_replication=true mean? A: Inserts through the Distributed table go to one replica per shard, and ReplicatedMergeTree handles copying to other replicas. Without it, the Distributed engine writes to every replica directly, duplicating writes.
Q: Do I need 10 Gbit networking? A: For production clusters of any meaningful size, yes. Replication, distributed query result forwarding, and large fetches saturate 1 Gbit links quickly.