DEV Community

Microservices communications. Why you should switch to message queues.

Matteo Joliveau on February 23, 2018

This is actually a more elaborate version of an old comment I wrote here When doing microservices, a fairly crucial design point is: how should my...
Collapse
 
sbellware profile image
Scott Bellware

What countermeasures do you use for when an ack message (whether RabbitMQ, SQS, or other) is lost due to either a network fault (or any other reason why a broker is unreachable) and a message is resent. How do you avoid processing the message a second (or more) time?

Collapse
 
matteojoliveau profile image
Matteo Joliveau • Edited

It depends on the way the message is processed.
Is it some kind of background job (e.g. updating a DB table or uploading a file to an object storage)? Then do your best to encapsulate the computation in a single, atomic transaction. It either succeeds entirely or fails entirely. Acks will be sent back once the transaction is successfully completed. The chances of an acknowledgement being lost at this point are very low, but a best practice is to have some kind of "commit" functionality, so you can process your job, send the ack and then commit when the broker confirms. This way, if something happens and the message cannot be acknowledged, you simply rollback the transaction and re-enqueue the job. This is basically what job queues like Sidekiq do.

Is it some kind of RPC? Then you don't protect against duplicate messages. If an acknowledgement is lost, it means that the call response will probably never be delivered, so it is better to re-enqueue the message and potentially duplicate it than risk loosing it.

Collapse
 
sbellware profile image
Scott Bellware • Edited

I'm specifically talking about service architecture, and not background jobs. I recognize that the two are only tangentially related. Definitely not talking about RPC.

So, distributed transactions that span a message acknowledgement and whatever actions a service takes are anathema to service architecture. They'd also increase the likelihood of dead locks, which would contradict the non-locking benefits of the architecture.

The three commonly-recognized guarantees of distributed, message-based systems are that messages will arrive out of order, that messages will not arrive at all, and that message will arrive more than once. This includes ACK signals - especially with regard to messages not arriving at all.

Irrespective of whether it happens infrequently, it will happen. Whether it happens one in a million times or a million in a million times, the work of implementing the countermeasures is the same. The presence of networks and computers and electricity guarantees that ACK messages will be lost and that messages will have to be reprocessed (messages will arrive more than once).

So, what I'm interested in is how you specifically account for the occurrence of message reprocessing that messaging systems guarantee.

Thread Thread
 
galdin profile image
Galdin Raphael

Here's my understanding: microservices have smart endpoint and dumb pipes. So it's the service's responsibility to account for message redeliveries like you already pointed out.

One way out would be to make the message processing operation idempotent. Two cases here:

  1. The operation is idempotent in and of itself. Doing nothing here will cause no side-effects.
  2. The operation is not idempotent in nature. Use an identifier (eg. a random unique guid, or hash of the message, etc.) to ensure a message is processed once per request.

Your thoughts?

Thread Thread
 
sbellware profile image
Scott Bellware

Indeed. However, a unique message identifier would have to be persisted in order to know that it has been previously processed. Where possible, and where the messaging tool is homogenous (within a logical service, rather than integrating disparate logical services), then a message sequence number can be a better option (especially in cases where event sourcing is in use).

Thread Thread
 
galdin profile image
Galdin Raphael

The message sequence number is interesting, thanks!

Collapse
 
goteguru profile image
Gergely Mészáros

Maybe I misunderstood what you want to say, but I can't see the inherent superiority. Here you seems to be talking about different architectural designs and all the shortcomings/profits are direct result of the architecture not the protocol used. Message queue broker is not really async (at least at protocol level) since the messages must be ack-ed. The message broker will become your single point of failure. So what did we achieve? We transferred the whole problem to some (arguably more regulated) dedicated service.

I think the real difference between the two is that HTTP (originally) was not designed to handle "push" notification therefore only one side could initiate the information xchange. Using HTTP/2 push or websockets we could easily implement full blown messaging over http (and still use REST).

It seems you implicitly suppose the service must immediately execute the whole business logic in response of REST request. That's not true. REST is about state transfer, not about business logic. If it's a HTTP REST (most common) we even have standard response code for delayed execution: HTTP 202.
It's completely valid to send POST: BeginMyTask: uuid and getting HTTP 202 "Task started." and later getting some notification of the finish. Just like you would do in case of message broker.

We may have bad architecture and good architecture. It might be harder to create bad architecture with message broker but it's still possible. However if I try to solve everything with message brokers, the threat of over-engineering is high and I may get inferior design.

For example if I have 5 services communicating independently over REST calls I have a very resilent network without single point of failure. If one service goes down, some feature will be inaccessible but the system will run happily. If I have a message broker and the message broker goes down... end of story.

Neither method is superior, everything depends on what would you like to achieve. What do you think?

Collapse
 
n1try profile image
Ferdinand Mütsch

Great article with good arguments! I really like the asynchronous messaging concept for inter-service communication.
Lately, I went to a talk by Spring Data project lead Oliver Gierke. He presented some kind of a hybrid approach between Microservice- and monolith architectures as well as a pretty interesting communication concept, which somehow is a mixture of direct API invocation and messaging. If you're interested, here's a recording.

Collapse
 
matteojoliveau profile image
Matteo Joliveau

That's a very interesting talk! Thanks for sharing it :D

Collapse
 
aldex32 profile image
Aldo Sinanaj

Hi Matteo, nice article! I have a question about correlation id. Where do you store it so you can keep track of it once you get a response on the response topic? I was wondering if you store it in a memory cache it will not be reliable if the producer will crash, or if you have more running instances of this component then you need a shared cache?! Can you share your experience how you have deal with this scenario? Thanks, Aldo

Collapse
 
matteojoliveau profile image
Matteo Joliveau

Hi Aldo, thanks for the question, it's actually a very good point I missed in the article!
Correlation IDs are only useful if someone is actually waiting for a response in particular. Since most microservices are stateless and routing is taken care of by the broker itself, the Correlation ID can be kept in the message headers and passed around (and logged, always log CorIDs!) between services without them caring much about it. The only one that needs to keep track of it is the request initiator (e.g. the HTTP API Gateway). It will create an ID and assign it to the request message, then wait on its response topic for a message with the same ID.
The other services will not need to care about it since the topic on which to send responses to is provided by the request message itself (in the reply_to RabbitMQ property for example) so every req/res is self-contained.
You receive a request, you process it and you send it to the provided topic.

On the other hand, if you need to contact another service in order to answer a request you just received, you become an initiator yourself, so when you send your secondary request you can use the same Correlation ID when waiting for the response.

Now to actually answer your question, the IDs you are traking can be kept in memory (normally you use a thread pool or an event loop to deal with requests/responses so the ID can be maintained there) or you can use a cache like Redis, but by no means, it must be shared. Remember that microservices are independent and must only care about themselves.

I will update the article to add this and other enhancements :)

Collapse
 
aldex32 profile image
Aldo Sinanaj

Thanks for the reply! Waiting for the update :-)

Collapse
 
brunoti profile image
Bruno Oliveira

Hi Matteo! Very nice article we have here, thank you for it.

I'm starting to study strategies for microservices and async based communications and I have some questions that, maybe, you can help me with.

  • The client should connect directly to the broker and send/receive messages? Is this a good or bad practice? What are your thoughts on security regarding the connection being "available" for the client (on the browser)?

  • If I decide to make an API that I call and the API sends the message to the broker, how should the client receive the message? Does the API have to hang the request while consuming the broker until receives a message that IS the waited response?

I'm kinda stuck into how this flux should be. If you have any resources that I could read/watch on this subject (some github repos, maybe), I believe they can be very helpful too.

Collapse
 
matteojoliveau profile image
Matteo Joliveau • Edited

Hi Bruno, thank you for reading it!

  • The client COULD connect directly to the broker and use an async API to consume data from the backend, for example by using a protocol such as MQTT (which RabbitMQ supports) that is designed for transmission over unstable networks. But I generally do not proceed this way, and I instead implement a "gateway" service (for example exposing a GraphQL o REST API) that will translate the requests to async messages.

  • Yes, the API has to wait for a response to come back on the reply queue. While this sounds bad and potentially slow, I assure you it is not (normally). Actually, with the correct setup I often see this approach being faster than regular HTTP calls between microservices, being AMQP capable of transmitting binary payloads (compared to text for HTTP/1.1) and the generally lower overhead introduced by AMQP compared to HTTP.
    From the client point of view, it's just an API call and it will return with a response in acceptable time. Nonetheless, whenever you have to wait for a reply you should implement timeouts so that you don't hang indefinitely. Maybe the calling service has gone away or has crashed, the response may never arrive. So as for HTTP, request timeouts are very important.

Collapse
 
tarungo profile image
TARUN GOEL

Hi Matteo, great article! I have a question that in what format messages must be communicated among the microservices. I am currently using hashmaps or json strings and dont think they are the best industrial practices. Is there anything you could advice.

Collapse
 
matteojoliveau profile image
Matteo Joliveau

If you're using brokers that support binary payloads, I recommend you look at MessagePack if you want a schema-less format (very akin to JSON). If on the other hand, you want to have a strongly typed message schema, with known data structures very similar to programming language classes, then Protocol Buffers or Apache Thrift are much more robust solutions.

I recommend the latter if you have many different microservices. MessagePack is easier but can be less ideal when scaling.

Collapse
 
dorianneto profile image
Dorian Neto

How the producer know that a specific or all messages were processed by consumer? I mean, how it works the consumer's response?

Collapse
 
rafaeljesus profile image
Rafael Jesus

Thanks for the article, quick question, how do you produce messages when the Broker is unavailable? Yes I know there are several ways to manage this, I am just curious how you're doing _^

Collapse
 
matteojoliveau profile image
Matteo Joliveau • Edited

Ideally, your broker will never be unavailable.
I know, this is silly because we don't live in an ideal world, but you should consider your broker a supporting service similar to a database or email sender.
In a cloud environment it is best to have the broker set up as a managed service (e.g. Amazon SQS) or in a high-availability cluster separated from your core application (Rabbit supports clustering as a primary feature).

If by any chance your broker actually went down, it is an unexpected crisis that must be dealt with depending on your project and infrastructure. Your services are allowed to ungracefully die (but it is always best to implement some failover, by logging the issue and halting operations while the service attempt to reconnect, maybe with some exponential backoff) in this case because it's a disaster situation and not an operational incident.

Collapse
 
u007 profile image
James

rabbitmq does not adhere to sequence of queue

Collapse
 
vam1021 profile image
Vijay Mareddy

Hi Matteo, nice article! How about a new one titled : "Microservices communications. Why you should switch to reactive restful web service "

Collapse
 
matteojoliveau profile image
Matteo Joliveau

Thanks, I'm curious to read it!