Load Balancing ClickHouse: Client-Side, chproxy, HAProxy, and Nginx

Q: What endpoint should the load balancer health check?

HTTP /ping on port 8123. It returns 200 with Ok. when the server is healthy, requires no auth, and does not load system tables.

Load balancing ClickHouse is different from load balancing a web service. The native protocol on port 9000 is long-lived and stateful, so generic TCP load balancers do not redistribute queries across servers the way an HTTP load balancer does. The HTTP protocol on port 8123 is more flexible, and a ClickHouse-aware proxy such as chproxy handles routing better than a generic Layer 7 proxy. The simplest approach for many deployments is no load balancer at all, with the client library picking a backend.

Client-Side Load Balancing First

The simplest setup uses client-side load balancing. Every official ClickHouse client library and most community drivers accept a list of endpoints and choose among them with logic for failover, retries, and round-robin selection. No intermediate proxy is required.

Python clickhouse-connect:

client = clickhouse_connect.get_client(
    host=["ch-1.example.com", "ch-2.example.com", "ch-3.example.com"],
    port=8123,
    username="default",
    password=...,
)

JDBC connection string:

jdbc:clickhouse://ch-1.example.com:8123,ch-2.example.com:8123,ch-3.example.com:8123/default

When the client itself owns the endpoint list, there is no extra hop, no proxy to keep running, and no chance of the proxy becoming the bottleneck. This works well when applications are deployed alongside ClickHouse and can be configured with the cluster's node list.

The trade-off is operational: every application needs the list, and adding or removing a node requires redeploying or reconfiguring the clients. For deployments with many client applications, a proxy in front simplifies the topology.

Native Protocol (Port 9000): TCP-Only Load Balancing

The native binary protocol on port 9000 is what clickhouse-client and most native drivers speak. It is more efficient than HTTP for large transfers but has one important property for load balancing: connections are long-lived and stateful, and no protocol-aware proxy understands its frame structure.

This means HAProxy and Nginx running in TCP (stream) mode can balance only at the connection level. They pick a backend when the client connects and hold that connection to a single server until the client disconnects or hits an idle timeout. Multiple queries on the same connection go to the same backend.

Three workarounds exist when you need per-query distribution over the native protocol:

Close connections client-side after each query. Some drivers support this directly. The cost is connection establishment overhead per query.
Close connections server-side after each query. Set idle_connection_timeout to 0 or a low value so the server tears down idle connections aggressively. The proxy then re-balances on reconnect.
Front the cluster with a ClickHouse Distributed table. A dedicated ClickHouse node (or any node) hosts a Distributed table that routes incoming queries across shards. The proxy points clients to this node and ClickHouse itself handles distribution.

A minimal HAProxy config for native protocol with connection-level balancing:

frontend ch_native
    bind *:9000
    mode tcp
    default_backend ch_cluster

backend ch_cluster
    mode tcp
    balance leastconn
    option tcp-check
    server ch-1 ch-1.example.com:9000 check
    server ch-2 ch-2.example.com:9000 check
    server ch-3 ch-3.example.com:9000 check

balance leastconn distributes new connections to the backend with the fewest active connections. This is the most useful TCP balancing mode for long-lived ClickHouse connections.

HTTP Protocol (Port 8123): More Options

The HTTP interface on port 8123 is stateless from the proxy's point of view. Each query is a separate HTTP request, and any Layer 7 proxy can distribute them. This is the easier protocol to load balance.

A minimal Nginx config in front of three ClickHouse nodes:

upstream clickhouse {
    server ch-1.example.com:8123;
    server ch-2.example.com:8123;
    server ch-3.example.com:8123;
}

server {
    listen 8123;
    location / {
        proxy_pass http://clickhouse;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_read_timeout 3600s;
    }
    location /ping {
        proxy_pass http://clickhouse/ping;
    }
}

ClickHouse's /ping endpoint returns Ok. with HTTP 200 when the server is healthy and is the right target for upstream health checks. The proxy_read_timeout matters because ClickHouse queries can take minutes; the Nginx default of 60 seconds will sever long queries mid-result.

HAProxy in HTTP mode works equivalently:

frontend ch_http
    bind *:8123
    mode http
    default_backend ch_cluster_http

backend ch_cluster_http
    mode http
    balance roundrobin
    option httpchk GET /ping
    timeout server 3600s
    server ch-1 ch-1.example.com:8123 check
    server ch-2 ch-2.example.com:8123 check
    server ch-3 ch-3.example.com:8123 check

chproxy: A ClickHouse-Aware Proxy

chproxy is an HTTP proxy specifically built for ClickHouse. It implements features that generic proxies cannot, because it knows what a ClickHouse query is:

Per-user query limits and concurrency caps
Caching of identical queries
Routing rules based on the user, the cluster, or query type
Killing of queries that exceed configured time limits
Forwarding to different clusters or replicas based on user

If you have multiple workloads (interactive dashboards, batch ETL, ad-hoc analysts) sharing a single ClickHouse cluster, chproxy makes it tractable to isolate them with per-user quotas and routing. For a single workload with one or two applications, chproxy is overkill compared to client-side balancing or a basic Nginx config.

A minimal chproxy configuration:

server:
  http:
    listen_addr: ":8124"

users:
  - name: "default"
    password: ""
    to_cluster: "prod"
    to_user: "default"
    max_concurrent_queries: 4
    max_execution_time: 30s

clusters:
  - name: "prod"
    nodes: ["ch-1.example.com:8123", "ch-2.example.com:8123", "ch-3.example.com:8123"]
    users:
      - name: "default"

AWS NLB and ALB

On AWS, the Network Load Balancer (NLB) is the appropriate Layer 4 option for the native protocol and also works for HTTP. The Application Load Balancer (ALB) is HTTP-only and works for port 8123.

Both support health checks on the /ping endpoint for HTTP backends. For NLB with the native protocol, configure a TCP health check on port 9000 since /ping is HTTP-only.

Health Checks

Whichever load balancer you use, point health checks at HTTP /ping. It returns Ok. with status 200 quickly, requires no authentication, and is cheap. Do not use a SELECT query as a health check because it consumes a connection slot, runs against system tables, and is sensitive to load on the server.

For the native protocol on port 9000, a TCP connection check is the only option short of writing a custom probe.

When You Need a Load Balancer

Multiple client applications that should not all know the full node list
Per-user or per-workload quotas (chproxy)
Centralized TLS termination
Logging and metrics on every query in one place
Public exposure of ClickHouse behind a controlled endpoint

If none of these apply, client-side balancing is simpler and faster.

Common Pitfalls

Putting HAProxy or Nginx in front of port 9000 and expecting per-query distribution across servers. It does not work that way; the connection sticks to one server.
Setting the proxy's read timeout too low and cutting off long ClickHouse queries.
Using a SELECT query as a load balancer health check, generating constant load on system tables.
Forgetting that the proxy is now a single point of failure unless it is run in HA itself.
Running chproxy when client-side balancing would meet the requirements with less operational surface.

Frequently Asked Questions

Q: Can HAProxy load balance the ClickHouse native protocol per query? A: No. HAProxy operates at the TCP level for the native protocol and balances per connection, not per query. The connection sticks to one server until it closes.

Q: What is the simplest way to load balance ClickHouse? A: Client-side balancing in the driver. Every major ClickHouse client library supports multiple endpoints with failover.

Q: When should I use chproxy? A: When you need per-user query limits, caching, query killing, or routing rules across a shared cluster. For a single workload, client-side balancing is simpler.

Q: Should I use ALB or NLB for ClickHouse on AWS? A: NLB for native protocol on 9000, either NLB or ALB for HTTP on 8123. ALB gives more HTTP features; NLB is simpler and lower latency.

Q: What endpoint should the load balancer health check? A: HTTP /ping on port 8123. It returns 200 with Ok. when the server is healthy, requires no auth, and does not load system tables.