ClickHouse on Kubernetes: cgroups, CPU Limits, and max

Q: Why is ClickHouse using only one CPU in my Kubernetes pod?

Almost always because requests.cpu is unset, so the cgroup reports cpu.shares = 2 , and ClickHouse autodetects max_threads = auto(1) . Set CPU requests and limits explicitly.

Q: Should I set CPU limits or only requests?

Set both. Requests guarantee scheduling and drive max_threads autodetection. Limits bound the worst-case CPU consumption and are required for predictable latency.

Q: Can I override max_threads instead of fixing the cgroup?

You can, but setting max_threads = N while leaving cgroup constraints unchanged will cause N threads to fight over a small CPU quota. Fix the cgroup.

Q: What CPU requests and limits should I start with?

A reasonable pattern is requests.cpu at half the node's cores and limits.cpu at the full node. For strict isolation, set requests = limits so the pod is Guaranteed QoS.

ClickHouse on Kubernetes will sometimes run queries 10 to 100 times slower than the same workload on a bare-metal host with identical CPU count. The usual cause is max_threads being autodetected as 1 because the container's cgroup reports almost no CPU shares. This is a configuration problem, not a ClickHouse bug, and the fix is to set CPU requests and limits explicitly.

What Changed in 22.2

Starting in ClickHouse 22.2, the server respects cgroup CPU limits when autodetecting max_threads. On a bare-metal host with 16 cores, max_threads = auto(16). Inside a container, ClickHouse reads the cgroup CPU constraints and derives a smaller number.

The intent is sensible: a containerized ClickHouse with a 4-core limit should use 4 threads, not 16. The problem appears when no CPU limits are set, or when only limits are set without requests, because cgroups in those cases report values that look like a 1-CPU container.

The Failure Mode

On AWS EKS, GKE, and most managed Kubernetes distributions, a pod without resources.requests.cpu and resources.limits.cpu produces the following cgroup state inside the container:

cpu.cfs_quota_us = -1
cpu.shares = 2

cpu.cfs_quota_us = -1 means "no hard quota", which on its own does not constrain threads. The issue is cpu.shares = 2. ClickHouse's older autodetection logic computed available CPUs as cpu.shares / 1024 (where 1024 is the per-CPU share constant), rounded up. The math:

ceil(2 / 1024) = 1

max_threads is then auto(1). Every query runs single-threaded. CPU usage stays near a single core. p99 latency rises by orders of magnitude. The pod's CPU utilization graphs look fine because the pod genuinely is using only one core. The bug here is in the autodetection logic, not the cgroup state, and it was fixed in later releases.

The Fix: Set Requests and Limits

The reliable solution across ClickHouse versions is to set CPU requests and limits explicitly on the pod. The container's cgroup then reflects a real CPU budget.

A pattern that works on EKS, GKE, and AKS:

resources:
  requests:
    cpu: "8"
    memory: "60Gi"
  limits:
    cpu: "16"
    memory: "60Gi"

With requests.cpu = 8, the cgroup's cpu.shares is 8 * 1024 = 8192. ClickHouse computes max_threads = auto(8). With limits.cpu = 16, the cgroup's cpu.cfs_quota_us allows bursting up to 16 cores when the host has capacity. On a 16-core node, this gives ClickHouse 8 threads as a guaranteed baseline and lets it burst higher.

A common pattern is to set requests at half the node's cores and limits at the full node. This produces reasonable thread counts for max_threads autodetection while still allowing burst. Set requests = limits if you want a fully predictable, non-bursting allocation, which is appropriate for production analytical workloads where queueing is preferable to noisy-neighbor latency variance.

Memory Limits in cgroups

CPU is the more common pitfall but memory has a parallel issue. ClickHouse can read the cgroup memory limit and adjust max_server_memory_usage accordingly on newer versions. On older versions, or when max_server_memory_usage_to_ram_ratio was tuned for a bare-metal host with full RAM access, ClickHouse may try to use more memory than the cgroup allows, and the kernel will OOM-kill the container.

The fix is the same shape as CPU: set memory requests and limits, and configure max_server_memory_usage to leave headroom inside the limit. A reasonable starting point:

<clickhouse>
  <max_server_memory_usage_to_ram_ratio>0.85</max_server_memory_usage_to_ram_ratio>
</clickhouse>

Combined with a pod limit of 60Gi, this caps ClickHouse at roughly 51Gi and leaves room for the kernel, page cache, and other in-container processes.

cgroups v1 vs v2

Kubernetes nodes increasingly run cgroups v2 (kernel 5.x defaults, recent OS releases). ClickHouse autodetection handles both, but the file paths differ:

cgroups v1: /sys/fs/cgroup/cpu/cpu.shares and cpu.cfs_quota_us
cgroups v2: /sys/fs/cgroup/cpu.max (combined quota/period) and cpu.weight

If you are debugging a slow pod, check which version the node uses (stat -fc %T /sys/fs/cgroup/). On v2, the relevant file is cpu.max, which shows quota period (for example 800000 100000 for 8 CPUs over a 100ms period).

Diagnosing the Problem

Inside a misconfigured pod, the giveaway is one query that hits SQL like:

SELECT name, value FROM system.settings WHERE name = 'max_threads';

If the value reads 'auto(1)' and the node has more than one CPU, the cgroup is the cause. Also check:

SELECT * FROM system.asynchronous_metrics
WHERE metric LIKE '%CPU%' OR metric LIKE '%Cgroup%';

CGroupMaxCPU (in newer versions) reports the CPU count ClickHouse sees from the cgroup.

Common Pitfalls

Setting only limits.cpu without requests.cpu. Some Kubernetes versions still produce a low cpu.shares value in this case.
Setting requests.cpu = 0.5 for a heavy ClickHouse pod. This rounds to a small share count and produces too few threads.
Forgetting that cpu.shares is a weighted scheduling parameter, not a hard quota. The autodetection in older ClickHouse multiplied this with PER_CPU_SHARES = 1024 to derive a thread count, which is brittle.
Setting max_threads manually to a high value while leaving cgroup constraints in place. Threads exist but contend on the limited CPU quota, causing thrash without throughput.
Running an older ClickHouse with the autodetection bug and assuming Kubernetes is at fault. Upgrade and verify with system.settings.

Frequently Asked Questions

Q: Why is ClickHouse using only one CPU in my Kubernetes pod? A: Almost always because requests.cpu is unset, so the cgroup reports cpu.shares = 2, and ClickHouse autodetects max_threads = auto(1). Set CPU requests and limits explicitly.

Q: Should I set CPU limits or only requests? A: Set both. Requests guarantee scheduling and drive max_threads autodetection. Limits bound the worst-case CPU consumption and are required for predictable latency.

Q: Can I override max_threads instead of fixing the cgroup? A: You can, but setting max_threads = N while leaving cgroup constraints unchanged will cause N threads to fight over a small CPU quota. Fix the cgroup.

Q: What CPU requests and limits should I start with? A: A reasonable pattern is requests.cpu at half the node's cores and limits.cpu at the full node. For strict isolation, set requests = limits so the pod is Guaranteed QoS.

Q: Is this fixed in newer ClickHouse versions? A: The autodetection logic was improved in recent releases, which handle cgroup CPU detection more robustly, but setting CPU requests explicitly is still the recommended practice.

ClickHouse on Kubernetes: cgroups, CPU Limits, and max_threads