Kafka Throughput per Partition: How Much Can One Partition Handle?

Q: How does compression affect throughput?

Compression ( lz4 , zstd ) trades CPU for network and disk bandwidth. On compressible data (JSON, text), it commonly doubles effective throughput and is essentially free. On already-compressed data (images, video), it costs CPU with no benefit.

A common Kafka sizing question: how much can a single partition handle? The honest answer is "it depends on hardware, message size, replication factor, and what you're optimizing for - but here are the numbers we actually see in production." This guide gives you both the rough rules and the levers that move them.

Quick Numbers

For a single partition on modern commodity hardware with replication factor 3:

Workload shape	Realistic throughput per partition
Small messages (<1 KB), high message rate	10-30 MB/s, ~10,000-50,000 msg/s
Medium messages (1-10 KB), batched producers	20-50 MB/s
Large messages (>100 KB), low message rate	50-100 MB/s (network-limited)
Compacted topics under heavy compaction	5-15 MB/s (compaction overhead)

These are sustained numbers, not peak. Peak burst throughput can be 2-3x higher for short windows.

If you remember nothing else: assume ~25 MB/s per partition as a planning default, then validate with a load test. The variance is huge - I've seen 5 MB/s and 100 MB/s in the same cluster depending on what's actually happening.

What Limits Partition Throughput

A partition is a single append-only log file on a single broker (the leader). That gives you four bottlenecks, in roughly the order they bite:

1. Producer batching and compression

A producer sending un-batched, uncompressed messages tops out at the broker's per-request handling rate - typically 5,000-15,000 requests/sec depending on hardware. That's fine for big messages, painful for small ones.

Batching solves this. With linger.ms=10 and batch.size=64KB, a producer batches small messages into bigger requests, multiplying effective throughput by 10-100x. Compression (lz4 or zstd) shrinks the wire and disk footprint, often doubling effective throughput on text-heavy payloads.

This is the single biggest lever. A misconfigured producer cap can make a partition look slow when the broker has plenty of capacity.

2. Network bandwidth and replication

Every write to a partition is replicated to N-1 followers. With replication factor 3, every produced byte costs 3 bytes of disk and 2 bytes of network (broker to follower replicas). On a 10 Gbps network shared with consumer fan-out, you can saturate the NIC before you saturate the disk.

This is why "100 MB/s per partition" only works in benchmarks - real workloads have many partitions per broker, and the broker's total throughput is bounded by NIC and disk, not by any single partition.

3. Leader broker's disk

The leader writes to its local log. Sequential writes on modern NVMe disks are 1-3 GB/s, so a single partition isn't disk-limited until it's pushing hundreds of MB/s. With many partitions on the same broker, however, you're competing for that bandwidth, and writes start to look more random.

Disk IOPS matters less than you'd think for produce because writes are sequential, but it matters a lot for log segment rolling, compaction, and consumer reads from older data.

4. Page cache pressure

Kafka relies heavily on the OS page cache. Consumers reading recent data hit cache; consumers reading older data hit disk. A broker with many active partitions and consumers reading from disk (because they're lagging) will see throughput collapse as the page cache thrashes.

The Math: Planning Capacity

To pick a partition count for a topic:

partitions = max(target_throughput / per_partition_throughput,
                 target_consumer_parallelism)

Worked example. You expect 500 MB/s peak write to a topic, and your downstream service can run up to 40 parallel consumers.

partitions = max(500 MB/s / 25 MB/s, 40)
           = max(20, 40)
           = 40

Add a margin (50-100%) for traffic growth, and round up. So 60-80 partitions.

The reverse direction also matters: more partitions doesn't always mean more throughput. Past a certain point you get more overhead (replication, controller load, file descriptors) than gain.

How to Measure Real Throughput

Numbers from a blog post are not your numbers. Test with kafka-producer-perf-test.sh:

kafka-producer-perf-test.sh \
  --topic test-perf \
  --num-records 10000000 \
  --record-size 1024 \
  --throughput -1 \
  --producer-props \
      bootstrap.servers=localhost:9092 \
      acks=all \
      compression.type=lz4 \
      linger.ms=10 \
      batch.size=65536

--throughput -1 runs as fast as possible. The output gives you records/s, MB/s, and latency percentiles. Run with a topic that has the partition count you're planning.

For consumer side:

kafka-consumer-perf-test.sh \
  --topic test-perf \
  --bootstrap-server localhost:9092 \
  --messages 10000000

Match the partition count, message size, and acks settings to your real workload. Otherwise the numbers are meaningless.

Settings That Move the Number

Producer side:

Setting	Effect on per-partition throughput
`batch.size`	Bigger batches = higher throughput, higher latency. 64 KB-256 KB is typical.
`linger.ms`	Wait time for batching. 5-50 ms trades latency for throughput.
`compression.type`	`lz4`/`zstd` typically doubles effective throughput on text.
`acks`	`acks=1` is faster than `acks=all` but loses durability on leader failure.
`max.in.flight.requests.per.connection`	Higher = more parallelism per partition.

Broker side:

Setting	Effect
`num.replica.fetchers`	More threads to fetch from leaders, helps replication catch up under load.
`num.network.threads`, `num.io.threads`	Capacity to handle requests.
`log.segment.bytes`	Larger segments = fewer rolls = less write amplification under compaction.

Common Mistakes

Sizing partitions by storage, not throughput. A topic with 10 TB of retention doesn't need more partitions than a 10 MB topic if the write rate is the same.
Adding partitions to existing topics for throughput. It works, but breaks per-key ordering. Plan up front.
Ignoring replication cost. "10 partitions at 25 MB/s = 250 MB/s in" is 2x that on the network because of replication.
Benchmarking with one consumer. A single partition's consume throughput is similarly bounded - the slower of producer or consumer wins.
Assuming the bottleneck is Kafka. Often it's the consumer (slow downstream, bad batch processing, GC pauses). Measure end-to-end.

Monitoring Partition Throughput

Per-partition metrics to watch:

BytesInPerSec / BytesOutPerSec per topic-partition - actual throughput.
UnderReplicatedPartitions - if replication can't keep up with writes, throughput is capped at follower fetch rate.
IsrShrinksPerSec / IsrExpandsPerSec - frequent ISR changes mean replicas are bouncing in and out of sync.
RequestQueueTimeMs, LocalTimeMs - per-stage latency in the broker request pipeline. Long queue time = broker saturated.
Producer record-send-rate, batch-size-avg - confirms the producer is actually batching efficiently.

Pulse tracks per-topic and per-partition throughput, replication lag, and consumer fan-out across your Kafka clusters, and surfaces hot partitions or replication bottlenecks long before they show up as production incidents. Start a free trial to see your cluster's real headroom.

Frequently Asked Questions

Q: How many MB/s can a Kafka partition handle?
A: A realistic planning number is 25 MB/s sustained per partition with replication factor 3 and typical settings. Optimized benchmarks reach 100 MB/s and higher, but real workloads with many partitions per broker, fan-out consumers, and replication overhead usually stabilize in the 10-50 MB/s range.

Q: Does adding more partitions always increase throughput?
A: Up to a point. More partitions add parallelism, but they also add replication threads, file descriptors, controller load, and metadata overhead. Past a few thousand partitions per broker, you spend more time managing partitions than serving data. KRaft pushed that ceiling much higher but didn't eliminate it.

Q: What's the maximum messages per second a partition can handle?
A: With small messages (1 KB) and good batching, 50,000-100,000 msg/s per partition is achievable on commodity hardware. With un-batched producers, you may cap at a few thousand. Message rate is rarely the actual bottleneck - byte rate is.

Q: How does replication factor affect partition throughput?
A: Each additional replica adds replication traffic (the leader has to send the data once per follower) and lengthens the acks=all round-trip. Going from RF=2 to RF=3 typically reduces achievable produce throughput by 10-20% and consumes proportionally more network bandwidth on the leader.

Q: How does compression affect throughput?
A: Compression (lz4, zstd) trades CPU for network and disk bandwidth. On compressible data (JSON, text), it commonly doubles effective throughput and is essentially free. On already-compressed data (images, video), it costs CPU with no benefit.

Q: Should I prefer many small partitions or few large ones?
A: Many small partitions give you more consumer parallelism and faster failover, at the cost of metadata overhead. Fewer large partitions are simpler and have less overhead but bottleneck consumer parallelism. The sweet spot for most workloads is "enough partitions to support 1-2x your peak consumer count, no more."