A Kafka offset is a unique, monotonically increasing integer assigned to each message within a partition. The broker assigns it at write time. The consumer uses it to track progress. Together, offsets are how Kafka turns a distributed log into a system you can replay, resume, and reason about.
There are actually several distinct things called "offset" in Kafka, and conflating them is the source of most confusion. This guide separates them.
The Four Offsets You Should Know
| Offset | Where it lives | Who sets it | What it means |
|---|---|---|---|
| Log offset | In the partition log | Broker, at write time | Position of a message in its partition |
| Log start offset | Broker metadata | Broker | Earliest still-retained message |
| Log end offset (LEO) | Broker metadata | Broker | Next offset to be written |
| Consumer offset | __consumer_offsets topic |
Consumer (via commit) | Last successfully processed message for a group |
When someone says "the offset," they usually mean the consumer offset. The other three matter when you're debugging.
Log Offsets: The Identity of a Message
Every message written to a partition gets a 64-bit offset, assigned strictly in append order. Offset 0 is the first message, offset 1 is the second, and so on - within that partition only. Offsets are not unique across partitions; partition 0 and partition 1 both have an offset 0, which are different messages.
A (topic, partition, offset) triple uniquely identifies any message in a Kafka cluster.
Topic: orders
Partition 0:
offset 0: {order: 100, status: created}
offset 1: {order: 101, status: created}
offset 2: {order: 100, status: paid}
offset 3: {order: 102, status: created}
Offsets don't reset when messages are deleted by retention. If retention drops offsets 0-1, the partition now has a log start offset of 2, and messages still resume from offset 2 onward. Offsets are forever, even when the messages are gone.
Consumer Offsets: Tracking Progress
A consumer reads from a partition starting at some offset and increments as it processes messages. When it commits, it records "I've processed up to offset X for this partition." If the consumer restarts or another consumer takes over, it resumes from the committed offset.
Where commits go: a special internal topic called __consumer_offsets. It's a compacted topic keyed by (group, topic, partition), with the committed offset as the value. The consumer client reads/writes this topic transparently.
Two ways to commit:
- Auto-commit (
enable.auto.commit=true, default): The consumer commits offsets periodically (default 5 seconds) based on records it has fetched, not necessarily processed. Easy but dangerous - a crash mid-processing means you've committed records you didn't actually process, and they're lost from the consumer's perspective. - Manual commit (
commitSync()orcommitAsync()): You decide when to commit, after processing is durable. This is what production applications should do.
The pattern:
while (true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofSeconds(1));
for (ConsumerRecord<String, String> record : records) {
process(record); // your business logic
}
consumer.commitSync(); // commit after processing
}
What "Committed Offset" Actually Means
Some subtleties that bite people:
- The committed offset is one past the last processed offset. If you processed offsets 0-99, you commit offset 100 (the next one you'd read).
- Committing an offset doesn't delete the message from the topic. Other consumer groups still see it. Retention is independent of consumption.
- Different consumer groups have independent offsets for the same partitions. Group A can be at offset 1000 while group B is at offset 500.
- When a consumer joins a group, the group coordinator looks up the committed offset for each assigned partition. The consumer starts reading from that offset.
Offset Reset: What Happens When There's No Committed Offset
A new consumer group, or one whose offsets have aged out of __consumer_offsets (which is also a topic with retention), has no committed offset. The auto.offset.reset setting decides what to do:
| Setting | Behavior |
|---|---|
earliest |
Start from the log start offset (the oldest still-retained message) |
latest |
Start from the log end offset (only new messages) - this is the default |
none |
Throw an exception |
The choice is workload-specific:
- Stream processing that needs to reprocess history →
earliest. - Real-time consumers that don't care about past events →
latest. - Strict pipelines where missing data is a bug →
none, and surface the error.
Choose deliberately. "Just use earliest" can mean replaying weeks of data on every restart of a new consumer group; "just use latest" can mean silently dropping data the first time a consumer starts.
Seeking and Replay
You can move a consumer's offset arbitrarily:
# Reset a group to the earliest offset on all partitions
kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
--group my-group \
--topic orders \
--reset-offsets --to-earliest --execute
# Reset to a specific offset
... --reset-offsets --to-offset 1000 --execute
# Reset to a specific timestamp
... --reset-offsets --to-datetime 2024-01-01T00:00:00.000 --execute
# Shift forward/back by N
... --reset-offsets --shift-by -100 --execute
Requires the group to be inactive (no live consumers). Programmatically, consumer.seek(partition, offset) does the same thing in code.
This is also how you replay - reset to an earlier offset and start processing again. As long as the messages haven't aged out, they're available.
How Long Do Consumer Offsets Live?
__consumer_offsets is a compacted topic, but its compaction has a TTL. By default, offsets for inactive groups are deleted after 7 days (offsets.retention.minutes=10080).
If a consumer group has no live members and doesn't commit for 7 days, its offsets disappear. When it comes back, it falls back to auto.offset.reset. This trips up batch jobs that run weekly and CI consumers that idle between runs.
To extend, raise offsets.retention.minutes on the broker. Tradeoff: __consumer_offsets grows with the number of groups times the retention.
Common Mistakes
- Relying on auto-commit for at-least-once processing. Auto-commit advances offsets on
poll(), not after your code processes. A crash mid-processing means lost work. Disable it and commit manually. - Confusing offset with timestamp. Offsets are not times. The 1000th message could be from yesterday or 5 minutes ago, depending on traffic.
- Assuming offsets are contiguous. They aren't, after compaction or when transactional control records are interleaved.
- Committing after a long-running processing step without heartbeating. The consumer can be kicked from the group before it commits, causing a rebalance and re-processing.
- Resetting offsets without warning downstream systems. A
--reset-offsets --to-earlieston a production group can flood downstream services with hours or days of replayed traffic.
Monitoring Offsets
Operational metrics:
- Consumer lag: the difference between the partition's log end offset and the group's committed offset. The single most important consumer-side health metric. See Kafka consumer lag.
- Commit rate: how often the group is committing. Sudden drops can indicate consumers are stuck mid-processing.
- Time since last commit: a group that hasn't committed in minutes when traffic is flowing is almost certainly stuck.
- Log start offset jumps: indicate retention is deleting data. If a consumer's committed offset is below the log start offset (rare but possible), the consumer's next fetch will fail and reset.
Pulse tracks consumer offsets, lag, commit rates, and offset-reset incidents across all your Kafka consumer groups, surfacing stuck or drifting consumers before they affect downstream systems. Start a free trial to see your full offset map.
Frequently Asked Questions
Q: Where are Kafka consumer offsets stored?
A: In an internal compacted topic named __consumer_offsets. It's managed automatically by the Kafka brokers; you don't read or write it directly in normal operation, though tools like kafka-consumer-groups.sh query it.
Q: How do I see the current offset of a consumer group?
A: kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group <group>. Output shows the current offset, log end offset, and lag for each partition.
Q: How do I reset a Kafka consumer offset to the beginning?
A: kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group <group> --topic <topic> --reset-offsets --to-earliest --execute. The group must have no active consumers when you do this.
Q: What's the difference between auto.offset.reset=earliest and latest?
A: earliest makes a consumer with no committed offset start from the oldest message in the partition. latest starts from the newest, meaning only messages produced after the consumer subscribes. latest is the Kafka default.
Q: Do Kafka offsets reset when messages are deleted by retention?
A: No. Offsets never reset or reuse. If retention deletes offsets 0-1000, the partition's log start offset becomes 1001, and new writes continue from wherever they were going. Consumers reading offsets 0-1000 will get an OffsetOutOfRangeException.
Q: Can two consumer groups have different offsets for the same topic?
A: Yes. Each consumer group tracks its own offsets independently. That's how multiple applications can consume the same topic at different speeds without interfering with each other.
Q: What happens if a consumer commits an offset that doesn't exist?
A: The broker accepts the commit (it's just an integer write), but on the next fetch, the consumer gets OffsetOutOfRangeException and either fails or resets according to auto.offset.reset.