Kafka Partitions and Consumer Groups

#kafka #apachekafka #kafkapartitions #kafkaconsumergroups

In my previous article, we had discussed how Kafka works and went through some basic Kafka terminology. In this article we would go over how Partitions and Consumer Groups work in Kafka.

If you haven’t gone through my previous article or if you’re new to Kafka, I recommend you to go through it as it’d help you get a basic understanding of how Kafka works.

You can find the complete article with some common Q&A's here

So, what is a Partition?

Before talking about partitions we need to understand what a topic is. In Kafka, a topic is basically a storage unit where all the messages sent by the producer are stored. Generally, similar data is stored in individual topics. For example, you can have a topic named “user” where you only store the details of your users, or you can have a topic named “payments” where you only store all the payment related details. A topic can be further subdivided into multiple storage units and these subdivisions of a topic are known as partitions.

By default a topic is created with only 1 partition and whatever messages are published to this topic are stored in that partition. If you configure a topic to have multiple partitions then the messages sent by the producers would be stored in these partitions such that no two partitions would have the same message/event.

All the partitions in a topic would also have their own offsets (If you don’t know what an offset is, I recommend you check out this article where I have discussed about it)

As an example, a producer producing messages to a kafka topic with 3 partitions would look like this:

Now, what is a Consumer Group?

A bunch of consumers can form a group in order to cooperate and consume messages from a set of topics. This grouping of consumers is called a Consumer Group. If two consumers have subscribed to the same topic and are present in the same consumer group, then these two consumers would be assigned a different set of partitions and none of these two consumers would receive the same messages.

Note: Consumer Groups can help attain a higher consumption rate, if multiple consumers are consuming from the same topic.

Now, let’s go through a few scenarios to better understand the above concepts

Scenario 1: Let’s say we have a topic with 4 partitions and 1 consumer group consisting of only 1 consumer. The consumer has subscribed to the TopicT1 and is assigned to consume from all the partitions. This scenario can be depicted by the picture below:

Scenario 2: Now let’s consider we have 2 consumers in our consumer group. These 2 consumers would be assigned to read from different partitions — Consumer1 assigned to read from partitions 0, 2; and Consumer2 assigned to read from partitions 1, 3.

Note: Kafka assigns the partitions of a topic to the consumer in a consumer group, so that each partition is consumed by exactly one consumer in the consumer group. Kafka guarantees that a message is only ever read by a single consumer in the consumer group.

Since the messages stored in individual partitions of the same topic are different, the two consumers would never read the same message, thereby avoiding the same messages being consumed multiple times at the consumer side. This scenario can be depicted by the picture below:

But, what if the number of consumers in a consumer group is more than the number of partitions? Check out Scenario 3

Scenario 3: Let’s say we have 5 consumers in the consumer group which is more than the number of partitions of the TopicT1, then every consumer would be assigned a single partition and the remaining consumer (Consumer5) would be left idle. This scenario can be depicted by the picture below:

Okay, and what if you want multiple consumers to read from the same partition? Check out Scenario 4

Scenario 4: If you want to assign multiple consumers to read from the same partition, then you can add these consumers to different consumer groups, and have both of these consumer groups subscribed to the TopicT1. Here, the messages from Partition0 of TopicT1 are read by Consumer1 of ConsumerGroup1 and Consumer1 of ConsumerGroup2. This scenario can be depicted by the picture below:

You can check out my previous article Apache Kafka: Basic Terminology