Oleg Potapov

Posted on Feb 22, 2023

Message Brokers: Queue-based vs Log-based

#kafka #rabbitmq #messagebrokers #distributedsystems

Message brokers are one of the essential features of modern distributed applications architecture. They provide an ability to asynchronously exchange messages for different parts of the system. Asynchronous messaging becomes more and more important, and so do message brokers. Today there are a lot of options to choose from, not only Rabbit and Kafka, but many more, including cloud-native solutions, provided by the biggest cloud providers like AWS or Azure. However, most of them can be divided into two groups: queue-based and log-based. Let’s see how they work.

Queue-based

Queue-based systems are based on the Queue data structure that should be familiar to everyone from the first steps in Computer Science. Queue is a simple data structure working by the FIFO (First In - First Out) principle. In its simplest form, the system consists of three components. There is the producer, sending messages to the queue, the consumer receiving them from the queue, and the queue itself.

Even such a primitive system may be useful, but real applications, especially those based on Event-Driven Architecture, require more complex topologies. Usually, they need the same event to be published to several other subsystems that will handle it separately. In other words, an event, once produced, should be available for several consumers. Thus the concept of topics is used in most queue-based brokers (in RabbitMQ they are called exchanges).

The producer sends a message to the topic, and only from there, the message is distributed to the queues. The implementation of this process may vary, for example in RabbitMQ the topic (exchange) is just a routing rule, defining in which queues should the message be put.

Why can’t several consumers receive the same message from one queue? That’s one of the key points and the biggest difference between queues and logs. Once a message is sent to the consumer and an acknowledgement is received, it’s removed from the queue and is no longer available for other consumers. That’s how reliability is achieved - the consumer has no state, it always receives the first message from the queue. When the consumer restarts, it receives the first message it didn’t acknowledge before failing.

The most popular queue-based brokers are RabbitMQ, ZeroMQ, ActiveMQ, Amazon SQS or even Redis PubSub, even though it’s not a message broker.

Log-based

As it follows from the name, the main difference of log-based message brokers is the usage of the log as a store for messages. Log is persistent storage and therefore several consumers can read from it in parallel.

Each consumer works at its own speed and reads the message from the different position in the log. On one side it makes them independent from each other and more decoupled from the broker. Another advantage is that multiple consumers can work with a single log, so there is no need to create additional entities for that purpose as in the case of queue-based brokers.
But this approach leads to another complexity - consumers' offsets (or cursors) must be stored somewhere. Having them saved somewhere inside consumers is not a good idea. If a consumer fails or is stale and replaced by another one, the new one should have access to the previous instance cursor, otherwise, it starts reading messages from the beginning which is usually not what we want.

Thus additional cursor storage should be used. A separate service might be used for this purpose but this service would require separate maintenance and monitoring. Probably the better option is to store these cursors inside the broker. One example of this approach is Apache Kafka - it stores consumers' offsets in the internal topic called __consumer_offsets. Apache Pulsar does something similar: it saves cursors in the BookKeeper like the other data.

The durability of the log brings another advantage - every consumer can read the message from every position and even replay the whole log of events from the start. It’s not a common case, but having the full log may be useful for Event-Driven systems in the context of eventual consistency.

The main representatives of the log-based group are Apache Kafka, Apache Pulsar and Amazon Kinesis.

Messages Order

One of the biggest differences between the two broker types comes to light when some kind of message ordering is required. There may be different kinds of ordering. The first one is related to the producer. When the producer publishes Event1 and then Event2, they must be stored in the broker and then received by the consumer in the same order. For example, when the Orders service publishes an OrderPlaced event and then an OrderCancelled event, the last one shouldn’t be processed before the first, because the consumer will not be able to handle it properly. This kind of ordering is provided by most of the brokers from both groups and is already a built-in functionality for them.

But there is another kind of ordering - within the related messages produced by separate services. Let’s look at an example. We have three services: Orders, Payments and Fulfillments. The orders service publishes OrderPlaced and OrderCancelled events, Payments service publishes the OrderPaid event and the Fulfillments service consumes these events and starts to fulfill the order on the OrderPaid event and stops to do it on the OrderCancelled event. But how to make sure that it consumes the OrderPaid event before OrderCancelled for the same order? Otherwise it may lead to unpredictable behavior.

For queues the solution is quite simple. As they are flexible in building message topologies it is possible to achieve almost any routing logic. In our case, we can have two separate exchanges for every event type (or for every service - in this case, it doesn’t matter) and a single query connected to both of them.

Such topologies are definitely one of the biggest features of queues. But there is even more powerful functionality offered by some of them, e.g. exchange-to-exchange bindings and different types of exchanges (like Topic, Fanout and Direct in RabbitMQ).

However it’s not that easy to build the topology for log-based brokers. The same strategy won’t work - if every event is published on a separate topic, they will be handled separately and there is no way to keep the order. And even though some brokers allow the consumer to consume messages from multiple topics, it doesn’t help in this case. Another strategy is to have a single topic for the whole system and send all events there.

It solves the problem with the ordering, but is not always the best solution. Now consumers read all the events produced in the system, even though they are usually interested in just a couple of them. And having one big topic for the entire system may become one big problem.

So, we need to somehow group events by their type and create a separate topic for each group.

There is no one and only way how to do it but there are a bunch of helpful recommendations:

if events are related to the same aggregate, put them in one topic
if events are related to the entities that depend on each other, it is worth to also keep them in the same topic
if one event is related to several entities don’t split it into multiple messages, it may be done on the later stages of event handling

There are some more tips and recommendations in the Kafka blog[2].

To all the problems with log-based topologies, I would add that it may be hard to change them if you feel you made a mistake or the system structure is altered. At the same time, queue topologies are usually more friendly in this aspect.

Scaling

It seems like a clear win for queues after the previous round, right? Don’t draw quick conclusions here. Everything changes when it comes to scaling. Scaling is always tricky, it may be just a bit easier or more complicated. And for log-based brokers, it’s much easier. Usually, to scale Kafka you have to increase the number of partitions in the topic.

To keep the ordering of related events only one thing should be added - each event should contain a partition key. Events with the same partition key will go to the same partition and will be handled in the correct order.

Doing the same for queues is much more complex. Queue-based brokers allow multiple consumers for the same queue, but it may cause ordering errors. When multiple consumers listen to the same query, the Competing Consumers pattern[3] is implemented. It means that the first consumer gets one event, the second consumer gets another one not waiting for the first one to finish and so on. It’s a great pattern for the independent task queue but not that great for the related events. Consumers work in parallel, which means it’s not possible to guarantee that the first event will be handled before the second.

Clearly, it’s not possible to solve it with one queue. Then there should be separate queues and events should be distributed among them on a higher level. One implementation of this idea is RabbitMQ Consistent Hashing Exchange type[4]. Similar to the partition key mechanism for logs, it distributes messages to queues based on their routing key. In order to get the queue by the routing key it uses the Consistent Hashing algorithm. But even with this or similar tools, the solution is not trivial.

Conclusion

So, summing up the advantages of the both types.

Queue-based brokers:

in theory, should have less latency because of using fast queuing protocols, like AMQP [6]. In practice, the difference is not that big as modern log-based brokers use powerful cache systems and don’t fall behind significantly
allow to build more flexible messaging topologies, which can be used for complex message routing or prioritization

Log-based brokers:

messages are persistent on the disk, it provides an ability to keep the history of events or replay the full sequence
usually are easier to scale

Brokers of both types can cover most of your needs, but it will require different amounts of work to make it consistent and reliable.

Links

DEV Community

Message Brokers: Queue-based vs Log-based

Top comments (0)

Read next

Carbon Credits: The Future of Sustainable Development for 2040!

The Ultimate Guide to iOS Development: Functions (Part 5)

学习笔记 - 汉明码

From Chaos to Clarity: Refactoring Long Methods into Pure Functions