Streaming Audio: A Confluent podcast about Apache Kafka®
Minimizing Software Speciation with ksqlDB and Kafka Streams ft. Mitch Seymour
Building a large, stateful Kafka Streams application that tracks the state of each outgoing email is crucial to marketing automation tools like Mailchimp. Joining us today in this episode, Mitch Seymour, staff engineer at Mailchimp, shares how ksqlDB and Kafka Streams handle the company’s largest source of streaming data.
Almost like a post office, except instead of sending physical parcels, Mailchimp sends billions of emails per day. Monitoring the state of each email can provide visibility into the core business function, and it also returns information about the health of both internal and remote message transfer agents (MTAs). Finding a way to track those MTA systems in real time is pivotal to the success of the business.
Mailchimp is an early Apache Kafka® adopter that started using the technology in 2014, a time before ksqlDB, Kafka Connect, and Kafka Streams came into the picture. The stream processing applications that they were building faced many complexities and rough edges. As their use case evolved and scaled overtime at Mailchimp, a large number of applications deviated from the initial implementation and design so that different applications emerged that they had to maintain. To reduce cost, complexity, and standardize stream processing applications, adopting ksqlDB and Kafka Streams became the solution to their problems. This is what Mitch calls, “minimizing software speciation in our software.”
It's the idea when applications evolved into multiple systems to respond to failure-handling strategies, increased load, and the like. Using different scaling strategies and communication protocols creates system silos and can be challenging to maintain.
Replacing the existing architecture that supported point-to-point communication, the new Mailchimp architecture uses Kafka as its foundation with scalable custom functions, such as a reusable and highly functional user-defined function (UDF). The reporting capabilities have also evolved from Kafka Streams’ interactive queries into enhanced queries with Elasticsearch.
Turning experiences into books, Mitch is also an author of O’Reilly’s Mastering Kafka Streams and ksqlDB and the author and illustrator of Gently Down the Stream: A Gentle Introduction to Apache Kafka.
EPISODE LINKS
- The Exciting Frontier of Custom ksql Functions
- Kafka Streams 101 Course
- Mastering Kafka Streams and ksqlDB Ebook
- ksqlDB UDFs and UDADs Made Easy
- Using Apache Kafka as a Scalable, Event-Driven Backbone for Service Architectures
- The Haiku Approach to Writing Software
- Watch the video version of this podcast
- Join the Confluent Community
- Learn Kafka on Confluent Developer
- Live demo: Kafka streaming on Confluent Cloud
- Use PODCAST100 to get $100 of free Confluent Cloud usage (details)