Want to understand why kafka is gaining so much traction and hiring managers are looking for people with this skill?
Well, I wrote this article to help illuminate kafka a little bit, but I also explain message queues in general first to build up the foundation to explain Kafka.
The purpose of a message queue is to help reliably deliver communications (or messages). If you google "what is a message queue" you will get an answer like:
Message queues provide an asynchronous communications protocol, meaning that the sender and receiver of the message do not need to interact with the message queue at the same time.
But I think you can explain this even more simply. You can think of a message queue like a PO box. The mailman might deliver a message there any time, and can keep putting them there until you go collect them. They can build up or be retrieved immediately.
In this example the mailman is the "Publisher" or "Producer" and when you go get mail you are the "Subscriber" or "Consumer".
Well, there are a lot of reasons but basically the answer is that real world systems are a lot larger and more complicated than a simple single consumer, single producer postal system.
You might have many consumers of the same message, you might have to deal with fault tolerance (what if something breaks while processing a message?) or you might want to be able to track when messages are being delivered.
Here is a diagram of what things might actually look like inside of a social media application that would allow you to upload photos. A user uploads a photo and it is saved to a database, in addition to a message queue. Multiple consumers use this notification so that they know a user has uploaded a new photo, but ultimately it will show up in the "notifications" of your friends.
So with the basics of the "what" and "why" of message queues out of the way, now what is Kafka?
The chief difference with kafka is storage, it saves data using a commit log. Kafka stores the messages that you send to it in Topics. Consumers can "replay" these messages if they wish. Normally in message queues, the messages are removed after subscribers have confirmed their receipt.
Another thing different about kafka is that the topics are ordered (by date they were added). Not all message queues guarantee this.
Individual Kafka servers that store messages are called "Brokers". Brokers are typically used in a cluster, which means many servers are linked together to handle lots of data and traffic. Topics may be further broken down in "Partitions" which are divided across brokers.
Kafka easily lets you divide up the work of publishing and consuming messages across a cluster of brokers. This is what it looks like:
Kafka was originally developed at LinkedIn to handle large quantities of traffic and provide a platform for handling real-time data feeds.
Kafka is designed to store data in what could be thought of as a transactional nature. Groups of consumers keep track of where they are while reading a topic so multiple consumers can read lots of the data from the same topic while breaking up the work between them. If you wish, can read any existing topic starting from the beginning to get all of the messages that were sent.
In terms of evaluation of somebody's experience with Kafka, there are a couple of things you could ask about.
- How many transactions per second did your system handle?
- How did you decide how to size your cluster?
- Why did you decide to go with Kafka over something simpler?
- You can ask things about the number of topics, partitions, consumer groups, etc.
- What challenges did you encounter implementing kafka in your system?
Generally asking "why" questions is a great way really understand if a person had decision making power, or if they really understood why they were doing something. But asking more specific questions about scale and challenges faced in implementation can be useful as well.
Message queues are a common architecture that might be encountered on the backend of many types of applications. Almost every significant application or company building software will have a message queue somewhere in their infrastructure once they get to a certain size.
Kafka is being adopted in many large organizations because of the ability to store data messages indefinitely and deal with high amounts of traffic. There are a lot of other features of Kafka that I didn't touch on (such as the stream processing system) but these are the basics and should help you understand a little bit about why people are using it and what it is for.
Note: I originally published this on the ITeach Recruiters blog here which is where I am building courses for recruiters, but I thought this article was general enough that new developers just getting into these concepts would be interested.