A Kafka commit log is the on-disk, append-only, ordered sequence of records that backs every partition in Apache Kafka. Each partition is its own log; records are written sequentially, assigned a monotonically increasing offset, and never modified once written. The commit log is the substrate that gives Kafka its high write throughput, replayability, and durability guarantees - and the term "commit log" is structural, not a name for a single shared file.
How the Kafka Commit Log Works
Each partition's commit log is split into segment files on the broker's disk. A segment is the actual .log file containing record batches in producer-write order, paired with two index files (.index for offset-to-byte lookups and .timeindex for timestamp-to-offset lookups). Kafka rolls a new segment when the current one reaches segment.bytes (default 1 GiB) or when segment.ms elapses (default 7 days).
Writes are append-only: a producer sends a record batch, the partition leader appends it to the active segment, the OS page cache absorbs the write, and follower replicas fetch and append the same bytes. Reads are sequential: a consumer requests records starting from an offset, Kafka uses the .index file to seek to the right byte position in the right segment, and then streams data using zero-copy sendfile(2) directly from page cache to socket. That's the mechanism behind Kafka's per-broker throughput in the hundreds of MB/s.
Partition: orders-0
└── /var/kafka-logs/orders-0/
├── 00000000000000000000.log # segment, bytes 0..1GiB
├── 00000000000000000000.index # sparse offset index
├── 00000000000000000000.timeindex # sparse timestamp index
├── 00000000000241038472.log # next segment, active for writes
├── 00000000000241038472.index
└── 00000000000241038472.timeindex
The file names are the base offset of the first record in that segment, zero-padded to 20 digits. That's how a broker can binary-search to the right segment for any requested offset in O(log n) time without scanning files.
Kafka Commit Log Configuration
The settings that drive log behavior:
| Setting | Scope | Default | What it controls |
|---|---|---|---|
segment.bytes |
topic / broker | 1073741824 (1 GiB) |
Max size of one segment file before rolling |
segment.ms |
topic / broker | 604800000 (7 days) |
Max age before rolling, even if under segment.bytes |
retention.ms |
topic | 604800000 (7 days) |
How long to keep closed segments |
retention.bytes |
topic | -1 (unlimited) |
Total partition size cap |
cleanup.policy |
topic | delete |
delete (drop old segments) or compact (keep latest per key), can combine |
min.cleanable.dirty.ratio |
topic | 0.5 |
Fraction dirty before log compaction triggers |
log.flush.interval.messages |
broker | Long.MAX_VALUE |
Kafka defers fsync to the OS; rarely changed |
compression.type |
topic | producer |
Codec applied per batch on disk |
Two policies share the same log:
cleanup.policy=deletedrops entire segments once they exceedretention.msorretention.bytes. Kafka never deletes individual records - it deletes whole closed segments. That's why your topic can sit slightly over its retention size for hours.cleanup.policy=compactruns a background log cleaner that rewrites segments keeping only the latest record per key. Tombstone records (null value) signal deletion of a key. Compacted logs power state-store and CDC patterns.cleanup.policy=compact,deletecombines both: compact by key, then drop anything older thanretention.ms.
How the Commit Log Delivers Durability
A record is considered "committed" only after enough in-sync replicas have appended it to their local log. This is controlled by two settings working together:
- Producer-side:
acks=all(the producer waits for the leader to confirm all in-sync replicas have appended). - Topic-side:
min.insync.replicas=2(or more) defines what "enough" means.
If only acks=all is set without min.insync.replicas >= 2, a partition can degrade to a single in-sync replica and Kafka will still ack writes - which means a single broker loss can lose data. The commit log itself is fsync-deferred by default: Kafka trusts the OS page cache and depends on replication, not local fsync, for durability. Tuning log.flush.interval.messages to force fsync per record is almost always a mistake on modern hardware.
Common Mistakes with the Commit Log
- Treating the log as a database. A commit log is sequential storage, not a B-tree. Reading from offset 0 requires scanning every segment, which is fast streaming but terrible for point lookups. Use compacted topics if you need latest-by-key, and a real KV store if you need random lookups.
- Setting
retention.mstoo low for compaction. Compacted topics still respectretention.msifcleanup.policy=compact,deleteis used. Settingretention.ms=1hon a compacted state topic deletes records the compactor was about to keep. - Confusing segment rolling with retention. The active segment is never deleted, even if it contains records older than
retention.ms. If a partition is idle, its records can outlive the retention window until enough new writes force a roll. - Sizing
segment.bytestoo large. Bigger segments mean retention is coarse - data only ages out when whole segments are dropped. For low-throughput topics with strict retention, smaller segments (e.g. 100 MiB) age out more responsively. - Disabling replication.
replication.factor=1puts the commit log on one broker. A disk failure is permanent data loss.
Monitoring the Commit Log
Operationally watch:
LogEndOffsetandLogStartOffsetper partition - the gap is the partition's record count on disk.Sizeper log - validates retention is actually freeing space; runaway growth means the log cleaner is failing or retention is misconfigured.LogCleanerManagererrors - a failed log cleaner stalls compaction silently and bloats compacted topics indefinitely.- Under-replicated partitions - a partition with fewer in-sync replicas than
replication.factoris one broker failure away from durability loss. - Disk I/O wait - the commit log's throughput collapses if the underlying disk is saturated.
Pulse monitors Kafka commit-log health across brokers: it surfaces stalled log cleaners, partitions drifting out of ISR, segment rolling anomalies, and disk pressure with root-cause context instead of raw metric dumps. Pulse covers Kafka, Elasticsearch, OpenSearch, and ClickHouse with agentic SRE diagnostics.
Frequently Asked Questions
Q: How is a Kafka commit log different from a database write-ahead log?
A: A database WAL is internal: it exists to recover the table on crash and is usually pruned aggressively. A Kafka commit log is the primary storage - consumers read it directly and may replay from days ago. It's also distributed and replicated across brokers, where a WAL is per-database-instance.
Q: Can I update or delete a record in a Kafka commit log?
A: No, individual records are immutable. To "update" a value, write a new record with the same key to a compacted topic and the log cleaner will eventually drop the older record. To "delete", write a tombstone (null value) for that key. Whole segments age out via retention.ms or retention.bytes.
Q: How does Kafka manage the size of the commit log on disk?
A: Each topic has retention.ms (default 7 days) and retention.bytes (default unlimited). Closed segments older than retention.ms or pushing the partition over retention.bytes are deleted on a periodic check (log.retention.check.interval.ms, default 5 minutes). The active segment is never deleted, regardless of age.
Q: What is the default Kafka log segment size?
A: segment.bytes defaults to 1 GiB (1073741824 bytes). A new segment is rolled when the active segment hits that size or when segment.ms (default 7 days) elapses, whichever comes first.
Q: How does log compaction work?
A: A background log cleaner thread per broker scans the "dirty" portion of compacted topics, builds an in-memory map of key -> latest offset, and rewrites segments keeping only the latest record for each key. Tombstone records (key with null value) mark a key for deletion; after delete.retention.ms (default 24 hours) the tombstone itself is removed.
Q: Does Kafka fsync every write to the commit log?
A: No, by default Kafka leaves fsync to the OS. Durability is provided by replication, not by forcing each write to physical disk. Forcing fsync per record (log.flush.interval.messages=1) destroys throughput and is almost never the right answer - increase min.insync.replicas instead.
Q: Where is the Kafka commit log stored on disk?
A: Under each broker's log.dirs (configurable, often /var/lib/kafka/data or /var/kafka-logs). Each partition gets its own subdirectory named <topic>-<partition> containing the segment files. Multiple log.dirs entries spread partitions across disks for parallel I/O.
Related Reading
- Kafka Topic: the logical container around a set of partition logs
- Kafka Partition: the unit each commit log instance backs
- Kafka Broker: the server that stores the log on its local disk
- Consumer Offset: how consumers track their position in the log
- Kafka Producer: how records get appended to the log
- Schema Registry: keeping the records in the log decodable as schemas evolve
- Apache Kafka Glossary: all Kafka terms in one place