What is a Commit Log in Apache Kafka? Kafka's Storage Model Explained

A Kafka commit log is the on-disk, append-only, ordered sequence of records that backs every partition in Apache Kafka. Each partition is its own log; records are written sequentially, assigned a monotonically increasing offset, and never modified once written. The commit log is the substrate that gives Kafka its high write throughput, replayability, and durability guarantees - and the term "commit log" is structural, not a name for a single shared file.

How the Kafka Commit Log Works

Each partition's commit log is split into segment files on the broker's disk. A segment is the actual .log file containing record batches in producer-write order, paired with two index files (.index for offset-to-byte lookups and .timeindex for timestamp-to-offset lookups). Kafka rolls a new segment when the current one reaches segment.bytes (default 1 GiB) or when segment.ms elapses (default 7 days).

Writes are append-only: a producer sends a record batch, the partition leader appends it to the active segment, the OS page cache absorbs the write, and follower replicas fetch and append the same bytes. Reads are sequential: a consumer requests records starting from an offset, Kafka uses the .index file to seek to the right byte position in the right segment, and then streams data using zero-copy sendfile(2) directly from page cache to socket. That's the mechanism behind Kafka's per-broker throughput in the hundreds of MB/s.

Partition: orders-0
└── /var/kafka-logs/orders-0/
    ├── 00000000000000000000.log        # segment, bytes 0..1GiB
    ├── 00000000000000000000.index      # sparse offset index
    ├── 00000000000000000000.timeindex  # sparse timestamp index
    ├── 00000000000241038472.log        # next segment, active for writes
    ├── 00000000000241038472.index
    └── 00000000000241038472.timeindex

The file names are the base offset of the first record in that segment, zero-padded to 20 digits. That's how a broker can binary-search to the right segment for any requested offset in O(log n) time without scanning files.

Kafka Commit Log Configuration

The settings that drive log behavior:

Setting Scope Default What it controls
segment.bytes topic / broker 1073741824 (1 GiB) Max size of one segment file before rolling
segment.ms topic / broker 604800000 (7 days) Max age before rolling, even if under segment.bytes
retention.ms topic 604800000 (7 days) How long to keep closed segments
retention.bytes topic -1 (unlimited) Total partition size cap
cleanup.policy topic delete delete (drop old segments) or compact (keep latest per key), can combine
min.cleanable.dirty.ratio topic 0.5 Fraction dirty before log compaction triggers
log.flush.interval.messages broker Long.MAX_VALUE Kafka defers fsync to the OS; rarely changed
compression.type topic producer Codec applied per batch on disk

Two policies share the same log:

  • cleanup.policy=delete drops entire segments once they exceed retention.ms or retention.bytes. Kafka never deletes individual records - it deletes whole closed segments. That's why your topic can sit slightly over its retention size for hours.
  • cleanup.policy=compact runs a background log cleaner that rewrites segments keeping only the latest record per key. Tombstone records (null value) signal deletion of a key. Compacted logs power state-store and CDC patterns.
  • cleanup.policy=compact,delete combines both: compact by key, then drop anything older than retention.ms.

How the Commit Log Delivers Durability

A record is considered "committed" only after enough in-sync replicas have appended it to their local log. This is controlled by two settings working together:

  • Producer-side: acks=all (the producer waits for the leader to confirm all in-sync replicas have appended).
  • Topic-side: min.insync.replicas=2 (or more) defines what "enough" means.

If only acks=all is set without min.insync.replicas >= 2, a partition can degrade to a single in-sync replica and Kafka will still ack writes - which means a single broker loss can lose data. The commit log itself is fsync-deferred by default: Kafka trusts the OS page cache and depends on replication, not local fsync, for durability. Tuning log.flush.interval.messages to force fsync per record is almost always a mistake on modern hardware.

Common Mistakes with the Commit Log

  1. Treating the log as a database. A commit log is sequential storage, not a B-tree. Reading from offset 0 requires scanning every segment, which is fast streaming but terrible for point lookups. Use compacted topics if you need latest-by-key, and a real KV store if you need random lookups.
  2. Setting retention.ms too low for compaction. Compacted topics still respect retention.ms if cleanup.policy=compact,delete is used. Setting retention.ms=1h on a compacted state topic deletes records the compactor was about to keep.
  3. Confusing segment rolling with retention. The active segment is never deleted, even if it contains records older than retention.ms. If a partition is idle, its records can outlive the retention window until enough new writes force a roll.
  4. Sizing segment.bytes too large. Bigger segments mean retention is coarse - data only ages out when whole segments are dropped. For low-throughput topics with strict retention, smaller segments (e.g. 100 MiB) age out more responsively.
  5. Disabling replication. replication.factor=1 puts the commit log on one broker. A disk failure is permanent data loss.

Monitoring the Commit Log

Operationally watch:

  • LogEndOffset and LogStartOffset per partition - the gap is the partition's record count on disk.
  • Size per log - validates retention is actually freeing space; runaway growth means the log cleaner is failing or retention is misconfigured.
  • LogCleanerManager errors - a failed log cleaner stalls compaction silently and bloats compacted topics indefinitely.
  • Under-replicated partitions - a partition with fewer in-sync replicas than replication.factor is one broker failure away from durability loss.
  • Disk I/O wait - the commit log's throughput collapses if the underlying disk is saturated.

Pulse monitors Kafka commit-log health across brokers: it surfaces stalled log cleaners, partitions drifting out of ISR, segment rolling anomalies, and disk pressure with root-cause context instead of raw metric dumps. Pulse covers Kafka, Elasticsearch, OpenSearch, and ClickHouse with agentic SRE diagnostics.

Frequently Asked Questions

Q: How is a Kafka commit log different from a database write-ahead log?
A: A database WAL is internal: it exists to recover the table on crash and is usually pruned aggressively. A Kafka commit log is the primary storage - consumers read it directly and may replay from days ago. It's also distributed and replicated across brokers, where a WAL is per-database-instance.

Q: Can I update or delete a record in a Kafka commit log?
A: No, individual records are immutable. To "update" a value, write a new record with the same key to a compacted topic and the log cleaner will eventually drop the older record. To "delete", write a tombstone (null value) for that key. Whole segments age out via retention.ms or retention.bytes.

Q: How does Kafka manage the size of the commit log on disk?
A: Each topic has retention.ms (default 7 days) and retention.bytes (default unlimited). Closed segments older than retention.ms or pushing the partition over retention.bytes are deleted on a periodic check (log.retention.check.interval.ms, default 5 minutes). The active segment is never deleted, regardless of age.

Q: What is the default Kafka log segment size?
A: segment.bytes defaults to 1 GiB (1073741824 bytes). A new segment is rolled when the active segment hits that size or when segment.ms (default 7 days) elapses, whichever comes first.

Q: How does log compaction work?
A: A background log cleaner thread per broker scans the "dirty" portion of compacted topics, builds an in-memory map of key -> latest offset, and rewrites segments keeping only the latest record for each key. Tombstone records (key with null value) mark a key for deletion; after delete.retention.ms (default 24 hours) the tombstone itself is removed.

Q: Does Kafka fsync every write to the commit log?
A: No, by default Kafka leaves fsync to the OS. Durability is provided by replication, not by forcing each write to physical disk. Forcing fsync per record (log.flush.interval.messages=1) destroys throughput and is almost never the right answer - increase min.insync.replicas instead.

Q: Where is the Kafka commit log stored on disk?
A: Under each broker's log.dirs (configurable, often /var/lib/kafka/data or /var/kafka-logs). Each partition gets its own subdirectory named <topic>-<partition> containing the segment files. Multiple log.dirs entries spread partitions across disks for parallel I/O.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.