Many companies use the latest and greatest technologies to keep their applications efficient and fast and to make themselves productive. Today I am going to examine the major features that developers look for in their messaging technologies. Two of the most popular messaging technologies, Kafka and RabbitMQ, will be compared for each feature.
The features I am going to examine are:
- Routing Options
- Consumer Scaling
- Prioritized Message Delivery
These features where selected as they became important while I was creating a small data pipeline.
Basic knowledge of RabbitMQ and Kafka is required to properly understand this blog.
Kafka has great documentation that will teach you the basics.
Replay is a very cool feature since it allows consumers to “go back in time”. Replay simply means that consumers of the messaging technology can re-read consumed messages. This is helpful as it allows messages that have not been properly processed to be re-read and successfully processed. It can also be used to enable “late join” functionality where consumers can receive messages published in the past to load an in-memory database or run a new type of analytics on historical data for example and to enable Event Sourcing architectures.
Kafka stores all of its messages in a distributed log, the messages are not removed after they have been consumed. All of the messages published to Kafka remain in the log until a certain amount of time has passed, which is called the retention period. This allows consumers to re-read the source stream from a given Kafka topic. A Kafka consumer must track its own offset for each topic partition. This allows consumers to easily read messages from the past; simply set the offset to the desired value and start receiving messages.
RabbitMQ removes the message from the queue once it has been consumed. This means that a series of messages cannot be replayed. In order to replay messages when using RabbitMQ a replay infrastructure would need to be created. However, RabbitMQ applications do not need to track their position in the log like Kafka applications need to do since the RabbitMQ broker does that for them.
Routing is important in messaging as it can simplify the development process when the messaging broker can handle complex routing options. Routing options can give developers more freedom and flexibility when developing services.
RabbitMQ uses exchanges which handle the different routing options. A RabbitMQ exchange will take a message and route it to zero or multiple queues. Queues are bound to exchanges.
The possible exchange routing options are:
The fanout exchange broadcasts the message to every queue that is bound to the exchange.
Direct exchanges use a routing key, which is a short string. This exchange routes messages to all queues with an exact match for the routing key.
The topics method is similar to direct as it uses a routing key but allows for wildcard matching along with exact matching. This means that queues can subscribe to topics directly like “alarm” or use wildcards like “#” and “*”.
As an example, if a message is published to the topic “alarm.power.off” then in order to receive this message through the topic exchange the queue would need to subscribe to the topic “alarm.power.off”. If the queue wants to subscribe to all alarms that have to do with power then it would subscribe to “alarm.power.”. The “” character is a wildcard for any single level so it would receive messages for the topics “alarm.power.on” and “alarm.power.off”. If the queue wanted to subscribe to all alarms then it would subscribe to the topic “alarm.#”. The “#” character is a wildcard for many levels, so the queue would receive messages sent to the topics “alarm.power.on”, “alarm.error” and “alarm.connection.state.changed”.
Header exchanges are the most powerful but are also the slowest. This type of exchange could cause scaling issues. These exchanges do not use the routing key and instead parse the header of the message. Each queue bound to the exchange can have multiple values in the header that either match any or all of the values in the message. For example, each message could have the following values in its header:
From these values there could be the following possible bindings:
- id=7, name=foo, status=0, x-match=any
- id=12, name=bar, status=1, x-match=all
The x-match value either contains any or all.
Kafka has much more limited options for routing as it only provides the topic routing method without wildcard subscriptions.
With conventional queues scaling consumers will cause competing consumers. Competing consumers is when multiple consumers are all receiving messages from a single point-to-point channel. When a message is added to the channel any of the consumers could potentially receive the message but only one consumer will receive it. This causes related messages to potentially be processed out of order.
Kafka has a feature called consumer groups. A consumer group is a grouping of consumer services which will handle messages published to a topic. The way that Kafka allows for consumer groups is by giving each service in a consumer group a whole number of partitions of the topic. For example, if there is a topic with 3 partitions and there are 2 services then one service will get 1 partition and the other service will get 2 partitions. Since all related messages are written to the same partition, consumer groups allow for all related messages to be processed in order. Related messages could be different updates for a given customer record, for example.
RabbitMQ’s consumers could be scaled by having multiple consumers read from the same queue, but this causes competing consumers. Competing consumers can be undesirable because it potentially allows for related messages to be processed out of order, but they are not a problem when order does not matter.
RabbitMQ allows for a consistent hashing exchange, which simply splits a queue into multiple queues. The messages are distributed between the queues by a hashing of the routing key, message header or message property.
There are issues with consistent hashing as RabbitMQ does not help coordinate the consumers, like Kafka does.
Sending priority messages could be important depending on the application. Having different priority levels allows processing of higher priority messages before lower priority messages during periods of congestion.
Kafka currently does not have a way to allow for messages to be sent with a priority level or for them to be delivered in priority order because Kafka implements an ordered log — so all messages are stored and delivered in the order in which they are received regardless of congestion on the consumer side.
RabbitMQ supports priority queues. This means that individual queues can be set to have their own range of priorities. The priority of each message can be set when it is published. If the priority is not provided for a message a default value will be automatically set. Depending on the priority of the message it is placed in the appropriate priority queue.
All of the features mentioned are important features that developers, like myself, would like their messaging technology to have. However not all features are available in any one product. Kafka and RabbitMQ seem complimentary as one provides the features that the other cannot. This could be why many developers end up using both technologies together.