NEW

Pulse 2025 Product Roundup: From Monitoring to AI-Native Control Plane

What is a Kafka Topic? Apache Kafka Topics Explained with Examples

A Kafka topic is a named, ordered, append-only log of messages. Producers write to it; consumers read from it. Topics are the unit of organization in Apache Kafka: every message lives in some topic, and every consumer subscribes to one or more topics. The closest analogy is a database table or a folder on disk; topics are how you separate one stream of data from another.

Each topic is split into one or more partitions, which are the actual log files Kafka stores on disk. Partitions are where the parallelism comes from. Producers write to partitions in parallel, and a Kafka consumer group divides partitions across its members so messages can be processed concurrently across many machines.

How a Kafka Topic Works

A topic is a logical name. The data lives in partitions, and partitions are replicated across Kafka brokers for fault tolerance. When a Kafka producer sends a message:

  1. It picks a partition for the message, by key hash, round-robin, or a custom partitioner.
  2. It sends the message to the broker that holds the leader replica of that partition.
  3. The leader appends the message to its log and replicates it to follower brokers.
  4. Once enough replicas have acknowledged (controlled by acks and min.insync.replicas), the producer's send is considered successful.

Each message gets a monotonically increasing offset within its partition. Consumers track their progress through a topic by storing the consumer offset of the last message they processed.

Topic: orders
├── Partition 0: [msg-0][msg-3][msg-7]...
├── Partition 1: [msg-1][msg-4][msg-8]...
└── Partition 2: [msg-2][msg-5][msg-6]...

Messages within a single partition are strictly ordered. Across partitions, Kafka makes no ordering guarantees. That's the trade-off you accept for horizontal scale.

Creating a Kafka Topic

Topics are usually created with the kafka-topics.sh CLI shipped with Kafka:

kafka-topics.sh --bootstrap-server localhost:9092 \
  --create \
  --topic orders \
  --partitions 12 \
  --replication-factor 3 \
  --config retention.ms=604800000 \
  --config cleanup.policy=delete

This creates a topic named orders with 12 partitions, each replicated three times, retaining messages for 7 days (604800000 ms). For programmatic creation, use the Kafka AdminClient API in any client library.

Most production clusters disable auto.create.topics.enable on brokers so topics must be created explicitly. Auto-creation is convenient in development but a footgun in production: typos and stale code create unintended topics with default settings that may not match your retention or replication requirements.

Topic Configuration: The Settings That Actually Matter

There are dozens of settings, but a handful drive most operational outcomes:

Setting Purpose Common values
partitions Parallelism ceiling for producers and consumers 6-60 for most workloads, depending on throughput
replication.factor Number of broker replicas per partition 3 in production, 1 in dev
min.insync.replicas Minimum replicas that must ack a write 2 with acks=all for durability
retention.ms / retention.bytes How long / how much to keep 7 days is the default
cleanup.policy delete (time/size based) or compact (key-latest) delete for events, compact for state
segment.bytes Size of each log segment file 1 GB default; smaller for faster compaction
compression.type Per-topic compression producer (let producer decide), lz4, zstd

min.insync.replicas combined with acks=all on the producer is what gives you durable writes. Setting acks=all without min.insync.replicas=2 (or higher) is a common mistake. The promise sounds strong but it doesn't actually guarantee durability if only one replica is in-sync.

Regular Topics vs. Compacted Topics

Kafka supports two cleanup policies:

  • delete: messages are dropped when they hit a retention threshold (time or size). This is what you want for event streams: orders, clicks, sensor readings.
  • compact: Kafka keeps only the latest message per key, indefinitely. The topic becomes a queryable, replayable state store. It's what powers change data capture (CDC) patterns and Kafka Streams' KTables.

You can combine them with cleanup.policy=compact,delete for both compaction and a maximum age. That's rare but useful for slowly-changing state where you don't want unbounded growth.

How Many Partitions Should a Topic Have?

This is the question Kafka practitioners argue about most. The short answer: it depends on your max consumer parallelism and target throughput.

A few rules of thumb:

  • A consumer group's parallelism is capped at the number of partitions. If your topic has 6 partitions, only 6 consumers in a group can process simultaneously; extras sit idle.
  • Each partition has overhead on the broker: file handles, replication threads, memory. With ZooKeeper-era clusters, "tens of thousands of partitions per broker" was the danger zone. KRaft removed most of those scaling cliffs and modern clusters comfortably run far higher partition counts, but partitions still aren't free.
  • More partitions improve write throughput up to a point, then start hurting because of replication overhead.
  • You can increase partitions later, but you cannot decrease them. Keys hash differently after a partition increase, so message ordering by key is lost across the boundary. Plan for that.

Start with enough partitions for 1-2x your projected peak consumer count. Twelve is a reasonable default for most use cases.

Common Mistakes with Kafka Topics

  1. Mixing schemas in one topic. Producers writing different event types to the same topic forces consumers to handle every variation. Use separate topics or enforce a schema with Schema Registry and Serdes.
  2. Using compaction as a database. Compacted topics are great for state, but they're not transactional and they're slow to read from offset 0. Don't use them for high-volume lookups.
  3. Setting replication.factor=1. A single broker failure loses data permanently. Always run with at least 3 in production.
  4. Forgetting min.insync.replicas. Without it, you're not actually durable, even with acks=all.
  5. Over-partitioning. Each partition adds memory, file handles, and replication overhead. 1,000-partition topics are usually a smell.
  6. Naming topics inconsistently. orders-prod-v2, Orders.v3, order_events_v1: pick a convention (lowercase, hyphenated, with environment or version suffix) and enforce it.

Monitoring Kafka Topics

The metrics that matter most for topic health:

  • Under-replicated partitions. Any non-zero value means a broker is behind or down. It should be 0 in steady state.
  • Consumer lag. How far behind the latest offset each consumer group is. Sustained growth means consumers can't keep up.
  • Bytes in / bytes out per topic. Capacity planning and anomaly detection.
  • Log size on disk. Validates retention policy is working.
  • Leader election rate. Frequent leader changes signal instability.

Pulse provides AI-powered monitoring for Kafka, Elasticsearch, OpenSearch, and ClickHouse. It surfaces under-replicated partitions, consumer lag spikes, leader skew, and other operational issues across your topics, with root-cause analysis and actionable alerts. Start a free trial to see what it looks like on your own clusters.

Frequently Asked Questions

Q: What's the difference between a Kafka topic and a partition?
A: A topic is the logical channel that producers write to and consumers subscribe to. A partition is a physical, ordered log file that stores a subset of the topic's messages. One topic always has one or more partitions. Partitions are how Kafka scales horizontally and replicates for durability.

Q: How many topics or partitions can a Kafka cluster handle?
A: With KRaft mode (the modern controller, replacing ZooKeeper), partition limits are dramatically higher than in the ZooKeeper era. Single brokers have demonstrated hundreds of thousands of partitions in benchmarks, and clusters in the millions. The practical limit on your cluster comes down to hardware, replication factor, and how aggressively partitions are reassigned, not a fixed number.

Q: Can I delete a Kafka topic?
A: Yes, with kafka-topics.sh --delete --topic <name>, provided delete.topic.enable=true (the default since Kafka 1.0). Deletion is asynchronous: the topic is marked for deletion, partitions are removed from brokers, and metadata is cleaned up. The operation is irreversible, so always double-check you're targeting the right cluster.

Q: Can I rename a Kafka topic?
A: No. The workaround is to create a new topic with the desired name, mirror the data with MirrorMaker 2 or Kafka Connect, switch producers and consumers over, then delete the old topic.

Q: What's the maximum size of a Kafka topic?
A: No built-in limit. Topics can grow to many terabytes per partition. Practical limits are disk capacity per broker, replication bandwidth, and the time it takes to recover from a broker loss. Use retention.bytes and retention.ms to keep size bounded.

Q: Can I increase the number of partitions for an existing topic?
A: Yes, with kafka-topics.sh --alter --topic <name> --partitions <new-count>. The catch: keys hash to partitions, and the hash changes when you add partitions. Messages with the same key written before and after the change may end up on different partitions, which breaks per-key ordering for downstream consumers. If ordering matters, plan partition counts up front, or use a custom partitioner that's stable across partition counts.

Q: What's the difference between a regular topic and a compacted topic?
A: A regular topic (cleanup.policy=delete) drops old messages by time or size. A compacted topic (cleanup.policy=compact) keeps only the most recent message per key forever, which is what you want for materializing state from a stream. You can combine both policies if you need bounded state.

Q: How do I list all topics in a Kafka cluster?
A: kafka-topics.sh --bootstrap-server <broker> --list. For details on a single topic including partition layout, replicas, and ISR: kafka-topics.sh --bootstrap-server <broker> --describe --topic <name>.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.