As someone that has been crafting software for the better part of half of my life, it's be a long time since I've been as excited as I have been over the last few months. Since making the jump late last year into the world of stream processing, I just can't get enough. I've read every book on it that I can get my hands on and continuously scour the web for interesting content on it, engineering blogs with novel approaches for interesting use-cases, and at this point, I'm even dreaming about it.
"When Gregor Samsa woke up one morning from unsettling dreams, he found himself changed in his bed into a monstrous vermin."
Franz Kafka, The Metamorphosis
Alright, so I'm not turning into some monstrosity akin to our friend Gregor in Kafka's seminal work, or even close to that. But even with just a few months of exposure to Kafka (the technology), there's been a transformation on numerous fronts. It's affected the way that I want to work, the approach and solutions to the problems I encounter, and it's opened a wide range of doors that may have previously been closed without a paradigm like streaming.
The first part of this transformation was just how different working in this new streaming world was from what I had been accustomed. As I wrote in an earlier post describing the change:
In a nutshell, I was taking everything I had done previously in my career and voluntarily throwing it out the window to shift to an entirely different stack, programming paradigm, language, etc. I suppose it could have been terrifying, but of all things it was... exciting.
It was an entirely different paradigm for solving problems, with its own unique challenges and nuances that were interesting puzzles to solve. Architectural changes, dealing with replication, partitioning, figuring out which freaking JDK to use, streams, tables, and countless other already forgotten issues were weekly, if not daily, discussions.
At first, the transition was fun, most of which I attributed to the sheer novelty of it. Since everything was new, it felt more like a vacation from what I had been accustomed to working with. Sure, there were challenges, misunderstandings, and quite a few things that I just totally got wrong, but that's expected. My colleagues and I found ourselves vacillating between optimism and nearly wanting to scrap the experiment entirely until...it worked.
It was the most basic of scenarios, just simple enrichment, but it was mind-blowing. Messages were flowing into the stream, being enriched from known sources, and sent off to their final resting places. It really was just like magic except that I'd reckon even the best of magicians would struggle with the type of throughput we were handling.
I was hooked.
In even just a matter of months, I've become professionally obsessed with it. I've read a handful of fantastic books dedicated to the topic, scoured the web for interesting uses of it (shout-out to LinkedIn, Uber, Netflix, and all the other transparent tech-giants leveraging it), and since they weren't selling t-shirts anywhere, I even went out and got a piece of paper:
(If anyone at Confluent or anyone else with some Kafka swag is reading this, hit me up, I'm still looking for a t-shirt or two.)
I'd say the first glaring thing about Kafka is that it's provided another approach for solving problems, specifically, the ability to do so in real-time. This alone is an incredibly compelling story if you've been living your developmental life in a batch-processing world. You'd now have the ability to action items or handle events as they occurred instead of waiting until some given interval before even being aware they existed at all.
Streams don’t just solve every problem however, and I’ll let this quote from Bill Bejeck help you decide when it might be appropriate:
Here are the key points to remember: If you need to report on or take action immediately as data arrives, stream processing is a good approach. If you need to perform in-depth analysis or are compiling a large repository of data for later analysis, a stream-processing approach may not be a good fit.
Bill Bejeck, Kafka Streams in Action
I’ve always considered myself a pragmatist, and I strongly believe that you should use the best tool for the job, but always be cognizant of your biases. Your most comfortable hammer is likely not going to help you tighten a loose bolt, and if it does, well some irreparable damage might be done.
Kafka is easy enough to integrate with using tools like Connect to sync the data in Kafka down to various data sources (e.g nearly every flavor of relation and non-relation databases, Elasticsearch, Big Query, etc.) in real-time. Likewise, data can be sourced into Kafka the same way as well.
You don’t have to go all in on one approach or the other; Kafka can supplement your existing ecosystem. If you have long-running data analysis to do, then use a tool that’s best for that. If you need that data in Kafka to do enrichment or make decisions with it, then sync it to Kafka when it makes sense and use it there as soon as it’s available.
You have options, use them.
While the learning curve was steep, it paled in comparison to the analysis paralysis that followed. Despite the earlier quote from Bill on when streams were appropriate, it’s very easy for people to start feeling the instant gratification of what real-time processing looks like and get carried away.
Seeing a process that prior to Kafka may have only run once in the middle of the night to now seeing processing within seconds is awesome, especially to those outside of engineering. This is where you must be careful and resist the urge to stream everything (and push back when it doesn’t make sense to do so).
Given that today we are bursting at the seams with data. Companies are gathering more data than ever and are seeking to leverage all this information to make decisions on just about everything. It goes without saying that since companies are overflowing with data, they are receiving it constantly.
Data is constantly flowing into systems and time is money. Kafka enables these systems to make decisions, change courses, take action, or at least be notified immediately when something of value occurs instead of waiting until the report trickles into their inbox.
Kafka is very fast, resilient, and battle-tested. It can easily be tailored to fit your specific scenarios from debugging, anomaly/fraud detection, training of machine learning models, and much more. It's become a centerpiece among data frameworks, architectures, and processing systems.
It's a technology that can absolutely transform the way you build your applications, and even the capabilities of your business. Just make sure that you transform responsibly.