Running ClickHouse with Docker: Setup, Configuration, and Production Considerations

The clickhouse/clickhouse-server image on Docker Hub is the canonical starting point for running ClickHouse locally. (There is also a Docker Official Image available as simply clickhouse, maintained by the ClickHouse team via Docker's official images program.) It ships with a working default configuration, handles first-run setup automatically, and gets a single node accepting queries in seconds. Getting it running is straightforward. Getting it right - with persistent data, custom configuration, and a clear understanding of where Docker stops being adequate - takes a bit more thought.

Starting a Container

The minimum viable command to run ClickHouse:

docker run -d \
  --name clickhouse \
  --ulimit nofile=262144:262144 \
  -p 8123:8123 \
  -p 9000:9000 \
  clickhouse/clickhouse-server:25.3

The --ulimit nofile=262144:262144 flag is not optional noise. ClickHouse opens large numbers of file descriptors concurrently during query execution - data part files, WAL segments, network connections. Without raising the OS limit, you will hit Too many open files errors under any realistic workload. The value 262144 is what the official image documentation specifies, and it should be treated as a hard minimum.

Port 8123 is the HTTP interface; port 9000 is the native binary protocol (used by clickhouse-client and most native drivers). The HTTP interface accepts queries as POST bodies and returns results as plain text, TSV, JSON, or several other formats - useful for quick ad-hoc queries with curl. The native protocol is more efficient for bulk transfers and is what client libraries use by default. Expose both unless you have a specific reason not to.

By default, the container creates a default user with no password. As of 24.3+, that user also has network access disabled by default — it can only be reached locally inside the container. To enable open network access for local development (insecure, not for production), set CLICKHOUSE_SKIP_USER_SETUP=1, or configure CLICKHOUSE_USER and CLICKHOUSE_PASSWORD to create a named user with network access. Do not expose ports 9000 or 8123 on a public network without credentials.

Docker Compose for Local Development

Running a ClickHouse container in isolation is fine for a quick query test. For anything involving application development, you want Compose so the database and application start and stop together:

services:
  clickhouse:
    image: clickhouse/clickhouse-server:25.3
    container_name: clickhouse
    ulimits:
      nofile:
        soft: 262144
        hard: 262144
    ports:
      - "8123:8123"
      - "9000:9000"
    volumes:
      - clickhouse-data:/var/lib/clickhouse
      - clickhouse-logs:/var/log/clickhouse-server
      - ./config/clickhouse/config.d:/etc/clickhouse-server/config.d:ro
      - ./config/clickhouse/users.d:/etc/clickhouse-server/users.d:ro
    environment:
      CLICKHOUSE_DB: mydb
      CLICKHOUSE_USER: myuser
      CLICKHOUSE_PASSWORD: mysecretpassword
      CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT: 1
    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "http://localhost:8123/ping"]
      interval: 5s
      timeout: 5s
      retries: 10
      start_period: 15s

volumes:
  clickhouse-data:
  clickhouse-logs:

Two volume mounts matter here: /var/lib/clickhouse is where all table data, parts, and metadata live. /var/log/clickhouse-server holds server logs. Both need to survive container recreation. When you mount a host directory (bind mount) rather than a named volume, ClickHouse's file ownership requirements become relevant - the container runs as the clickhouse user (uid 101), and a host directory with root-owned permissions will cause startup to fail. Named volumes avoid this entirely by letting Docker manage ownership.

The CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT: 1 variable enables SQL-driven user management (the GRANT, CREATE USER, and CREATE ROLE syntax). Without it, user management falls back to XML configuration files only.

Configuration Overrides with config.d and users.d

ClickHouse reads its base configuration from /etc/clickhouse-server/config.xml, then merges any XML or YAML files it finds in /etc/clickhouse-server/config.d/. Files in config.d/ do not replace the base config - they are deep-merged on top of it. This means you can override a single nested setting without copying the entire 800-line config.xml.

A common override is adjusting memory limits. Server-level limits go in config.d/; per-query/user limits are user-level settings and must go in users.d/ under a <profiles> block:

<!-- config/clickhouse/config.d/memory.xml  (server-level) -->
<clickhouse>
  <max_server_memory_usage_to_ram_ratio>0.8</max_server_memory_usage_to_ram_ratio>
</clickhouse>

<!-- config/clickhouse/users.d/memory.xml  (user/query-level) -->
<clickhouse>
  <profiles>
    <default>
      <max_memory_usage>10000000000</max_memory_usage>
    </default>
  </profiles>
</clickhouse>

User-level settings go in users.d/ and follow the same merge behavior. To create a read-only user without touching the default users.xml:

<!-- config/clickhouse/users.d/readonly.xml -->
<clickhouse>
  <users>
    <readonly_user>
      <password>readonlypassword</password>
      <profile>readonly</profile>
      <quota>default</quota>
      <networks>
        <ip>::/0</ip>
      </networks>
    </readonly_user>
  </users>
</clickhouse>

ClickHouse processes files in lexicographic order within each directory. For scalar settings, the alphabetically later file's value takes precedence. For complex elements, files are deep-merged recursively by default — to fully replace a subtree rather than merge it, add replace="1" to the element in the overriding file. Name files with numeric prefixes (01-memory.xml, 02-network.xml) when load order matters.

Connecting to the Running Container

The two most direct connection methods are docker exec and the HTTP interface. For quick queries during development:

# Drop into an interactive clickhouse-client session
docker exec -it clickhouse clickhouse-client --user myuser --password mysecretpassword

# Run a one-off query
docker exec -it clickhouse clickhouse-client \
  --user myuser \
  --password mysecretpassword \
  --query "SELECT version()"

The HTTP interface at port 8123 is convenient for scripts and debugging because it requires no client binary:

# Basic query via curl
curl -u myuser:mysecretpassword \
  'http://localhost:8123/?query=SELECT+version()'

# POST format for longer queries
curl -u myuser:mysecretpassword \
  'http://localhost:8123/' \
  --data-binary "SELECT table, formatReadableSize(sum(bytes_on_disk)) AS size FROM system.parts GROUP BY table ORDER BY sum(bytes_on_disk) DESC"

The /ping endpoint at port 8123 returns Ok. with a 200 status when the server is healthy, which is why the Compose healthcheck uses it. It does not require authentication, making it appropriate for load balancer health probes.

External GUI clients — DBeaver, DataGrip, Tabix — connect primarily over the HTTP interface on port 8123 using JDBC or HTTP drivers. Direct native-protocol connections on port 9000 are used by clickhouse-client and native language drivers (Python clickhouse-driver, Go driver, etc.).

Production Considerations

A single ClickHouse container covers local development and exploratory analysis. It does not cover production. The gaps are architectural, not cosmetic.

Replication requires ClickHouse Keeper. ClickHouse's ReplicatedMergeTree engine coordinates replication through a distributed coordination service. Historically that was ZooKeeper; the current recommendation is ClickHouse Keeper (a native reimplementation with compatible protocol). A single Docker container has neither. You can run data in a non-replicated MergeTree, but then a disk failure or container crash means data loss. Introducing replication in Docker Compose is possible - you would need at least two ClickHouse containers and a separate ClickHouse Keeper container or a three-node Keeper ensemble - but this is operationally fragile compared to purpose-built tooling.

Volume durability is your responsibility. Docker named volumes live at /var/lib/docker/volumes/ on the host. If the host node fails or is reprovisioned, the volume disappears with it. For production workloads this means you either need a shared or replicated storage backend (NFS, Ceph, EBS with snapshot automation) or replication at the ClickHouse layer to keep data on more than one host. Neither is built into the Docker setup.

Schema changes and upgrades need coordination. ClickHouse does not guarantee on-disk format compatibility across major versions, and ALTER TABLE operations on replicated tables require all replicas to be running. Upgrading a containerized setup without a controlled draining and failover strategy risks leaving parts in an inconsistent state. On bare metal or VMs, this is a solved operational procedure. In Docker it requires more careful scripting.

For teams that need ClickHouse in production, the two credible paths are bare metal or VMs with configuration management (Ansible, Chef), or Kubernetes with the ClickHouse Operator. The Operator manages StatefulSets, persistent volumes, ClickHouse Keeper clusters, and rolling upgrades as Kubernetes custom resources. It abstracts most of the coordination that would otherwise require manual intervention. Managed services like ClickHouse Cloud remove the infrastructure burden entirely.

The Docker image is the right tool for development, CI pipelines, and integration testing - contexts where you want a fresh ClickHouse instance quickly and do not need the data to survive. The further you move toward production requirements, the more the surrounding infrastructure determines reliability, not the container itself.