Hello there and welcome to this series on Apache Kafka called Learning Kafka (I know, not inventive). In this series, we, (that is me and you) will embark on an adventure, to a faraway kingdom, where we’ll meet the protagonist of this series, called Kafka. Kafka was born an orphan on the street of ……… okay, that’s enough.
Now Learning Kafka, though not hard, can be quite complicated. Learning Kafka is definitely complicated for those of us new to data engineering, or systems design or who’ve never before worked with a distributed system. It seems like there is an endless stream (read that again) of terminologies to learn. Streams, producers, consumers, brokers, topics, partitions, offset, replication, connect, cluster, serialization, deserialization, distributed, throughput, latency, and on and on and on. Don’t run away yet, if these words sound like something someone will include in a master’s thesis, hopefully, by the end of this series, you’ll leave with your own master’s certificate.
For those who are familiar with other big data frameworks like Hadoop, Spark, Storm or any other distributed framework, some of these concepts will be familiar or easy to pick up. But for those of us who are fortunate (or unfortunate) to learn Kafka as our first distributed and/or big data framework, it seems like we are not just learning Kafka but a whole new ecosystem. Which, to a point, is true. Because you can’t truly understand Kafka without knowing how distributed system works, or what a Pub/Sub is.
By the end of this series, hopefully you’ll become not only familiar with these terms but also how they relate to Kafka.
One of the reasons learning Kafka is rather daunting, is we lack a somewhat detailed view of Kafka. On the surface, Kafka is a system for building real time data pipelines, good and fine. But when we try to build the promised data pipeline, things get complicated really really quick.
Also, the loosely coupled architecture of Kafka, one of the reasons it is so successful, is also the reason it can be hard to grasp. Because to understand Kafka, we must first understand all of its components independently, then figure out how they relate to each other.
And that is what we will be doing in this series. We’ll take a step back, study in depth Kafka’s design, architecture, components and how they all fit and work together.
This series will be divided into six parts; in part one, we will have a look at what Kafka is, its origin, use cases and features. In part two, an introduction to the core components of Kafka, like brokers, topics, partitions etc. Moving on in part three, a look at the design of Kafka. Part four will be further divided into three segments, with each segment focusing on a single component in Kafka’s ecosystem; Producers and Consumers, Kafka Streams and Kafka Connect, in that order. Part five will take a look at how these components interact with Kafka using APIs and Client Libraries. Finally, rounding up in part six, a look at other third party and community applications that can be integrated with Kafka.
Also, in this series, there will be no hands on or coding examples, no how to do anything, the objective of this series is to get to know Apache Kafka proper.
This series, despite best attempts, can in no way do justice to Kafka, because Kafka is deep. For further reading, more detailed and even more in-depth explanation, you can’t go wrong with any of these books;
Kafka, The Definitive Guide by Gwen Shapira, Todd Palino, Rajini Sivaram, Krit Petty Link here
Effective Kafka by Emil Koutanov Link here
I hope you’ll enjoy consuming this series. I certainly enjoy producing it.
Coming up, an introduction to our protagonist. Apache Kafka.
Top comments (0)