What Are Apache KafkaⓇ Consumer Group IDs?

#apachekafka #programming #todayilearned #database

This post was originally published on the Confluent blog.

Consumer Group IDs are a vital part of consumer configuration in Apache Kafka®. Setting the consumer Group ID determines what group a consumer belongs to, which has some major consequences. There are three areas in which Group IDs are particularly pertinent:

Detecting new data
Work sharing
Fault tolerance

Let’s dive in.

What is a Kafka consumer?

Kafka consumers read/consume data from Kafka producers, do the work of reading event streams. They read events, or messages, from logs called topics. Topics are further split into partitions, which are append-only logs that store the messages. This enables each topic to be hosted and replicated across a number of brokers.

As you can see in the diagram, a given consumer in a consumer group can read from multiple partitions, including multiple partitions housed in the same topic.

Using consumer Group IDs to detect new data

Group IDs are associated through the broker with bits of information called offsets, which specify the location of a given event within a partition, and as such, represent progress through the topic. Offsets in consumer groups serve the same purpose as how bookmarks or sticky tabs function in books. You can learn more about offsets in our FAQ.

Checking for new data

You can use a particular Group ID’s offset to check whether there’s been new data written to the partition. If there’s an event with a larger offset, that means there’s new data to read. If you want to know how to read the offset, here’s a command using the kafka-consumer-groups utility that will read your offsets:

kafka-consumer-groups --bootstrap-server localhost:9092 --describe --group group1 --offsets

Note that you need to provide a valid Group ID to --group if you’re trying out this command. The output will resemble the following:

`GROUP   TOPIC  PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG     OWNER
Groupname topicname     0        2               2         1       ownername

Or, if you want to learn more about how to do this with the Confluent CLI for a topic hosted in Confluent Cloud, you can check out this tutorial on reading from a specific offset and partition.

There’s more on the kafka-consumer-groups utility in our documentation, and you can always run kafka-consumer-groups—help for a full list of the options.

Consumer Group IDs in work sharing

The Group ID determines which consumers belong to which group. You can assign Group IDs via configuration when you create the consumer client. If there are four consumers with the same Group ID assigned to the same topic, they will all share the work of reading from the same topic.

If there are eight partitions, each of those four consumers will be assigned two partitions. What if there are nine partitions? That means the leftover partition will be assigned to the first consumer in the group so that one consumer reads from three partitions and the rest of the consumers read from two partitions. It’s the broker’s job to continually ensure that partitions are evenly distributed among the connected consumers.

Note: At the top, you'll see that although there are four consumers, three are idle. That's because only one consumer in the same group can read from a single partition.

This whole process is predicated on the presence of a Group ID to unify the consumers. It’s important to remember this while you’re setting up your consumers.

If you’re connecting microservices, you want to make sure that each service has its own consumer group (and hence its own Group ID). Why is that? Let’s walk through an example.

Let’s say there’s a topic “payments,” and both the “orders” microservice and the “refunds” microservice will need to read from that topic. You wouldn’t want them to share the same offsets, because if they did, the progress through the “payments” topic would be shared by “orders” and “refunds,” which would mean potential missed orders or refunds.

However, if you had a group of consumers handling “orders” by reading from partitions in the “payments” topic, then the current offset for each consumer in the group, stored in the broker, is vital to ensure continuous progress in case a consumer in the group crashes. At the same time, if consumers from another, separate group, like “refunds” are reading from the “payments” topic, they can continue their progress unaffected even if the consumers in the “orders” group are rebalancing.

The role of consumer Group IDs in fault tolerance

As the last example revealed, Group IDs also play a vital role in fault tolerance.

What happens when a consumer crashes?

Each consumer group’s broker sends “heartbeat requests” to the consumers at a set interval. If a consumer does not respond in time, a rebalance is triggered.

How does a Group ID play into rebalancing?

Well, in either case, the broker’s record of the associated offset determines where the consumer will begin reading after a rejoin. As long as the Group ID remains the same, it can pick up exactly where it left off, without any risk of data loss.

If you’re interested in learning more about rebalancing, we recommend the blog post Incremental Cooperative Rebalancing in Apache Kafka: Why Stop the World When You Can Change It?. You can also consult our FAQ.

Where to go from here

In summary, when you set a consumer Group ID in the process of creating a consumer client, that Group ID assigns the consumer to its group, which has ramifications for work sharing, detecting new data, and data recovery.

To learn more about this and other topics, check out these recommended resources:

Confluent Developer: Learn Apache Kafka through Confluent Developer tutorials, documentation, courses, blog posts, and examples.
Confluent Community: If you have a question about Apache Kafka or you’d like to meet other Kafka developers, head over to Confluent Community and introduce yourself on our Community Slack or Forum.
Streaming Audio Podcast: Listen to the Streaming Audio Podcast to hear lively conversations with Confluent users about the ins and outs of Apache Kafka. The episode Optimizing Kafka’s Internals covers consumer group internals.