Just like any distributed system, the nodes must keep a consensus-based communication with one another and this is referred to as the Gossip protocol. The data being exchanged usually contains the actual message or data payloads but there are also other network communications included in the data:
- Nodes becoming available and requesting cluster membership
- Configuration settings and management
- Controller election events
- Updates on the health status of workers
Zookeeper maintains the cluster's metadata. It manages the following
- manages the brokers and keeps a list of them
- configuration information
- health and sync status
- cluster membership
- helps in leader election
- send notifications to Kafka in case of changes
- number of ZooKeeper launched should on odd-number (3,5,7)
- does not support consumer offsets with Kafka versions below v0.10
It is, in itself, a distributed system which is comprised of multiple nodes called ensemble. It also has:
- leader, which handles the writes
- followers, which are the rest fo the servers which handles the reads
Clients (producers and consumers) write to Kafka brokers.
Kafka brokers read and write to the ZooKeeper nodes.
Initially, Kafka cannot work without Zookeeper. However, Apache Kafka 2.8.0 was released April 2021 with alot of features and improvements, chief of which is the elimination of Apache ZooKeeper.
In version 2.8.0, the Kafka brokers will now lean on an internal implementation of the Raft census algorithm. This is still in the works thus production use is still not being recommended. For more information on this, you can check out Kafka needs No Keeper
The subject of Zookeeper certainly deserves its own series if we are to dig deeper into it. I might create a separate post or series that's entirely dedicated to it but for now, this explanation is sufficient. We will see Zookeeper again in the succeeding topics.
Similarly, you can check out this awesome links about Zookeeper:
- What is ZooKeeper?
- Github repo: apache/zookeeper
- Apache Kafka 2.7 is One Step Closer to Killing ZooKeeper
- Zookeeper in action by Alexandre Berthaud
In the complete Apache Kafka Distributed Architecture, we have a Kafka cluster which is comprised of multiple independent brokers.
Associated with the cluster is the Zookeeper environment which provided the metadata that the cluster needs to operate reliably. The metadata is constantly changing, thus cluster members and the zookeeper will be communicating continuously.
Now, these components will scale out to process the demands that are coming from both the producers and consumers to ensure reliability and availability.
The succeeding notes will discuss the messaging internals of Apache Kafka, specifically topics and messaged. If you'd like to know more, please proceed to the next note in the series.
Similarly, you can check out the following resources:
Getting Started with Apache Kafka by Ryan Plant
Apache Kafka Series - Learn Apache Kafka for Beginners v2 by Stephane Maarek
Apache Kafka A-Z with Hands on Learning by Learnkart Technology Private Limited
The Complete Apache Kafka Practical Guide by Bogdan Stashchuk
If you've enjoyed this short but concise article, I'll be glad to connect with you on Twitter!. You can also hit the Follow below to stay updated when there's new awesome contents! 😃