If you’re an architect or developer looking at event-driven architectures, stream processing might be just what you need to make your app faster, more scalable, and more decoupled.
In this article—the third in a series about event-driven architectures—we will review a little of the first article in the series, which outlined the benefits of event-driven architectures, some of the options, and a few patterns and anti-patterns. We will also review the second article, which provided more detail on message queues and deployed a quick-start message queue using Redis and RSMQ.
This article will also dive deeper into stream processing. We will discuss why you might pick stream processing as your architecture, some of the pros and cons, and a quick-to-deploy reference architecture using Apache Kafka.
What is an Event-Driven Architecture?
Stream processing is a type of event-driven architecture. In event-driven architectures, when a component performs some piece of work that other components might be interested in, that component (called a producer) produces an event—a record of the performed action. Other components (called consumers) consume those events so that they can perform their own tasks as a result of the event.
This decoupling of consumers and producers gives event-driven architectures several benefits:
- Asynchronous—Communications between components are asynchronous, avoiding any bottlenecks caused by synchronous, monolithic architectures.
- Decoupled—Components don’t need to know about one another, and can be developed, tested, deployed, and scaled independently.
- Easy Scaling—Since components are decoupled, bottleneck issues can be more easily tracked to a single component, and quickly scaled.
There are two main kinds of event-driven architectures: message queues and stream processing. Let's dive into the differences.
Intro to Message Queues
With message queues, the original event-driven architecture, the producer places a message into a queue targeted to a specific consumer. That message is held in the queue (often in first-in, first-out order) until the consumer retrieves it, at which time the message is deleted.
Message queues are useful for systems where you know exactly what needs to happen as a result of an event. When an issue occurs, your producer sends a message to the queue, targeted to some consumer(s). Those consumers obtain the message from the queue and then execute the next operation. Once that next step is taken, the event is removed from the queue forever. In the case of message queues, the flow is generally known by the queue, giving rise to the term “smart broker/dumb consumer”, which means the broker (queue) knows where to send a message, and the consumer is just reacting.
Intro to Stream Processing
With stream processing, messages are not targeted to a certain recipient, but rather are published at-large to a specific topic and available to all interested consumers. Any and all interested recipients can subscribe to that topic and read the message. Since the message must be available to all consumers, the message is not deleted when it is read from the stream.
Producers and brokers don’t need or want to know what will happen as a result of a message, or where that message will go. The producer just sends the message to broker, the broker publishes it, and the producer and broker move on. Interested consumers receive the message and complete their processing. Because of this further decoupling, systems with event streaming can evolve easily as the project evolves.
Consumers can be added and deleted and can change how and what they process, regardless of the overall system. The producer and the broker don’t need to know about these changes because the services are decoupled. This is often referred to as “dumb broker/smart consumer”—the broker (stream) is just a broker, and has no knowledge of routing. The consumers in message processing are the smart components; they are aware of what messages to listen for.
Also, consumers can retrieve multiple messages at the same time and since messages are not deleted, consumers can replay a series of messages going back in time. For example, a new consumer can go back and read older messages from before that consumer was deployed.
Stream processing has become the go-to choice for many event-driven systems. It offers several advantages over message queues including multiple consumers, replay of events, and sliding window statistics. Overall, you gain a major increase in flexibility.
Should You Use Stream Processing or Message Queues?
Here are a several use cases for each:
Message Queues
Message queues, such as RabbitMQ and ActiveMQ are popular. Message queues are particularly helpful in systems where you have known or complex routing logic, or when you need to guarantee a single delivery of each message.
A typical use case for message queues is a busy ecommerce website where your services must be highly available, your requests must be delivered, and your routing logic is known and unlikely to change. With these constraints, message queues give you the powerful advantages of asynchronous communication and decoupled services, while keeping your architecture simple.
Additional use cases often involve system dependencies or constraints, such as a system having a frontend and backend written in different languages or a need to integrate into legacy infrastructure.
Stream Processing
Stream processing is useful for systems with more complex consumers of messages such as:
- Website Activity Tracking. Activity on a busy website creates a lot of messages. Using streams, you can create a series of real-time feeds, which include page views, clicks, searches, and so on, and allow a wide range of consumers to monitor, report on, and process this data.
- Log Aggregation. Using streams, log files can be turned into a centralized stream of logging messages that are easy for consumers to consume. You can also calculate sliding window statistics for metrics, such as an average every second or minute. This can greatly reduce the output data volumes, making your infrastructure more efficient.
- IOT. IOT also produces a lot of messages. Streams can handle a large volume of messages, and publish them to a large number of consumers in a highly scalable and performant manner.
- Event Sourcing. As described in a previous article, streams can be used to implement event sourcing, where updates and deletes are never performed directly on the data; rather, state changes of an entity are saved as a series of events.
- Messaging. Complex and highly-available messaging platforms such as Twitter and LinkedIn use streams (Kafka) to drive metrics, deliver messages to news feeds, and so on.
A Reference Architecture Using Kafka
In our previous article, we deployed a quick-to-stand-up message queue to learn about queues. Let’s do a similar example stream processing.
There are many options for stream processing architectures, including the following:
- Apache Kafka
- Apache Spark
- Apache Beam/Google Cloud Data Flow
- Spring Cloud Data Flow
We'll use the Apache Kafka reference architecture on Heroku. Heroku is a cloud platform-as a service (PaaS) that offers Kafka as an add-on. Their cloud platform makes it easy to deploy a streaming system rather than hosting or running your own. Since Heroku provides a Terraform script that deploys all the needed code and configuration for you in one step, it's a quick and easy way to learn about stream processing.
We won’t walk through the deployment steps here, as they are outlined in detail on the reference architecture page. However, it deploys an example eCommerce system that showcases the major components and advantages of stream processing. Clicks to browse or purchase products are recorded as events to Kafka.
Here is a key snippet of code from edm-relay, which sends messages to the Kafka stream. It's quite simple to publish events to Kafka since it's only a matter of calling the producer API to insert a JSON object.
app.post('/produceClickMessage', function (req, res) {
try {
const topic = `${process.env.KAFKA_PREFIX}${req.body.topic}`;
console.log(`topic: ${topic}`);
producer.produce(
topic,
null,
// Message to send. Must be a buffer
Buffer.from(JSON.stringify(req.body)),
// for keyed messages, we also specify the key - note that this field is optional
null,
// you can send a timestamp here. If your broker version supports it,
// it will get added. Otherwise, we default to 0
Date.now(),
);
} catch (err) {
console.error('A problem occurred when sending our message');
throw err;
}
res.status(200).send("{\"message\":\"Success!\"}")
});
A real-time dashboard then consumes the stream of click events and displays analytics. This could be useful for business analytics to explore the most popular products, changing trends, and so on.
Here is the code from edm-stream that subscribes to the topic:
.on('ready', (id, metadata) => {
consumer.subscribe(kafkaTopics);
consumer.consume();
consumer.on('error', err => {
console.log(`Error in Kafka consumer: ${err.stack}`);
});
console.log('Kafka consumer ready.' + JSON.stringify(metadata));
clearTimeout(connectTimoutId);
})
and then consumes the message from the stream by calling an event handler for each message:
.on('data', function(data) {
const message = data.value.toString()
console.log(message, `Offset: ${data.offset}`, `partition: ${data.partition}`, `consumerId: edm/${process.env.DYNO || 'localhost'}`);
socket.sockets.emit('event', message);
consumer.commitMessage(data);
})
The reference architecture is not just about buying coffee; it's a starting point for any web app where you want to track clicks and report in a real-time dashboard. It's open source, so feel free to experiment and modify it according to your own needs.
Stream processing not only decouples your components so that they are easy to build, test, deploy, and scale independently, but also adds yet another layer of decoupling by creating a “dumb” broker between your components.
Next Steps
If you haven’t already, read our other articles in this series on the advantages of event-driven architecture and deploying a sample message queue using Redis and RSMQ.
Top comments (2)
Nice article Jason! For beginners I am highly suggest to read Vaidehi's article at: dev.to/vaidehijoshi/ordering-distr... to understand ordering of distributed events and more. Also, it's worth mentioning here the Kafka production-ready Lenses platform: lenses.io for newcomers and experienced to data ops
It's worth checking out Kafka Streams (part of Apache Kafka) and ksqlDB too for stream processing.