RabbitMQ vs Kafka key differences

#kafka #rabbitmq #microservices #eventdriven

The most common two ways to implement communication between services is through RabbitMQ and Kafka.
what are they and how do they differ from each other?

What is RabbitMQ?
RabbitMQ is a message broker that implements the advanced message queuing protocol (AMQP), which is a standard for messaging interoperability. it is based on a broker queue model, where producers send messages to exchanges, and consumers receive messages from queues. RabbitMQ can route messages based on various criteria, such as topic, headers, fan-out or direct. RabbitMQ is designed for flexibility and reliability.
AMQP is an open standard for passing business messages between applications or organizations. AMQP allows you to be platform agnostic. AMQP enables message passing through TCP/IP connections while only allowing binary data to be sent across it. Some features that AMQP offers are its ability for message queuing, reliability and routing.

What is Kafka?
Kafka is an event streaming platform. according to Kafka docs, they define event streaming as: event streaming is the practice of capturing data in real time from event sources like databases, mobile devices, cloud services, and software applications in the form of streams of events, storing these event streams durably for later retrieval, manipulating, processing, and reacting to the event streams in real-time as well as retrospectively and routing the event streams to different destination technologies as needed.
event streaming is the process of gathering a bunch of data from a lot of points producing certain events and then storing and processing this data based on our needs. Kafka is event based and uses streams: streams can be thought of as a huge pipeline of infinite data.
Kafka provides one with the ability to publish an event to a data stream, store these streams of events, process these streams of data and finally subscribe to these streams. Kafka is a distributed system consisting of servers and clients communicating over the TCP protocol.

Push vs Pull Based Messaging
RabbitMQ uses something called as a smart producer, which means the producer of the data decides when to create it. prefetch limit is set on the consumers end to stop from overwhelming the consumer. such a push based system means that there is almost a FIFO structure in the queue. it is almost because some messages could be processed faster than others leading to an almost in order queue.

Kafka on the other hand uses a smart consumer. which means that the consumer has to request the messages that it wants from the stream. Kafka also allows one to set the time offset which a consumer wants the producer to take into account while producing messages. this means that all consumers can consume and process events at their own pace. one benefit of this pull system is that consumers can easily be added at any time and the application can be scaled to implement new services without any changes to Kafka.

Queues vs Topics
RabbitMQ is a basic queue data structure, the messages are added to the end of the queue called an exchange and are consumed from the top of the queue. in order to route messages in RabbitMQ message exchanges are used. there are different messaging patterns that RabbitMQ has for different use cases: direct, topic, headers and fanout. in a direct exchange, messages are routed based on the exact routing key of the message. the header pattern which ignores the routing key and instead uses the message headers to decide where to send the message. the topic message pattern which route using the routing key like the direct pattern but they allow two wildcards. These two wildcards are * (matches one word) and # matches any number of words. the fanout pattern in which messages are sent to the fanout exchange and these events are then broadcasted to other exchanges and queues which are subscribed to the exchange.

Kafka uses topics, topics can be described as a folder in a file system and whereas each event can be considered a file. there can be zero, one or multiple producers of events and zero, one or multiple consumers, events in Kafka aren’t deleted when consumed and you can set how long Kafka should retain your topics.
each topic in Kafka can have multiple partitions, each partition can be looked at as buckets. when producing events, each partition’s key can be matched to see if some event should be added to that particular partition. events with the same event key are written always to the same partition and Kafka guarantees that any consumer of a given partition will consume the events from that partition in order.

Some other quick differences
Events in Kafka can be replayed since they are not deleted as soon as they are consumed whereas events in RabbitMQ cannot be replayed since they are deleted as soon as they are consumed. Kafka can process a lot of the data in its streams in order for the consumers to consume whereas RabbitMQ does not provide the functionality to process data in its queue.
in RabbitMQ it is possible to specify message priorities and to consume messages based on the priority provided for each message. RabbitMQ supports creating a priority queue whereas there isn’t such a functionality in kafka. it can achieve high throughput with limited resources, a necessity for big data use cases.

So what are their use cases?

RabbitMQ:
complex routing: it is very easy to route RabbitMQ messages based on routing keys. if there is a requirement for routing messages based on a few criteria, RabbitMQ’s patterns can be used to achieve routing.
long running processes: RabbitMQ can be preferred in cases where there might be long running tasks. This is because there isn’t usually a need for Kafka’s strengths of storing and processing event data or replaying data. A queue of processes that need to get done satisfy this use case.

Kafka:
log aggregation: Kafka abstracts away the details of files and gives a cleaner abstraction of log or event data as a stream of messages. this allows for lower-latency processing and easier support for multiple data sources and distributed data consumption.
Stream processing: many users of Kafka process data in processing pipelines consisting of multiple stages, where raw input data is consumed from Kafka topics and then aggregated, enriched, or otherwise transformed into new topics for further consumption or follow-up processing.
high activity: Kafka can be preferred to be used for high volume data ingestions from IOT devices and other data points which are consistently producing a lot of events.

DEV Community

RabbitMQ vs Kafka key differences

Top comments (0)

Read next

DOM Manipulation in JavaScript

Choosing the Right Java Microservices Framework: Spring Boot, Quarkus, Micronaut, and Beyond

Multitenancy Is Fundamental to Shift-Left Testing

Understanding RabbitMQ and Implementing Real-Time Notifications with Firebase and Socket.IO