Sriram R

Posted on Feb 19, 2023

Exactly Once Processing

#distributedsystems #computerscience #beginners

Multiple Deliveries of a Message

Nodes in a distributed system talk to each other over a network.
As we saw in System Models, most networks are Fair Loss Links, where data can be lost while in transit.

We use retries to deal with this. When a node finds out that the message it tried to send might not have reached its destination, it tries again in the hopes that the message will eventually get through.

This seems like a good way to solve the problem, but there's a catch.
When a node sends a message, it waits for the receiving node to confirm that it got the message. There is a chance, though, that the message was received at its destination but that the acknowledgment message was lost on the way.

Imagine this happening!

In this case, the message was received by Node B, but the acknowledgement got lost along the way. Not knowing this, Node A did a retry, which means Node B got the same message twice and processed it twice.

Real World Example

Imagine that you're sending money to someone.

This is a bad situation because you're transferring money twice when you only meant to do it once.

Difference between Delivery and Processing

Before we can figure out how to avoid these kinds of situations, we need to know how Delivery and Processing are different.

Delivery is the hardware-level event of a message arriving in a specific node while processing is taking action on the received message on the Software Layer.

For example, if you send a request to a server, it is called "Delivery" when the server gets the request, and "Processing" when the server does something with the request.

This is important to know because we can't stop duplicate messages from being sent. Networks aren't reliable, and it's impossible to stop duplicate messages from being sent.

But we can use algorithms that act on duplicate messages to make sure that duplicate actions are only processed once.

So it's important to know that "exactly-once delivery" is not possible, but "exactly-once processing" is.

How do we avoid this?

To deal with these kinds of situations, we take precautions to make sure that a node only processes a message once, even if it is sent more than once.

Idempotency

An operation is idempotent if an action can be repeated multiple times with the same result.

Example

Let's say you work for a social media site and your job is to build the "likes" feature.

There is a chance that you could like a post more than once because of retries, which we talked about above.

Let's look at two ways to do this and see which one is better.

Method 1

We keep track of how many people like each post, and when we get a new like, we add one.

This operation is not retryable because if you use the increment operation more than once, you will end up with duplicate likes.

newLikeCount = oldLikeCount + 1

Method 2

We keep a set of users who have liked a specific post, and for each new like, we add the 'userId' to that list.
The property of a set is that it will not contain any duplicates.

This operation is idempotent because even if you try it more than once, the result won't change. It will still show that the user has liked the post once, since a set doesn't have any duplicates.

usersLiked = post.getExistingLikedUsers().add(userId);

Remove duplicates

There are times when it's hard to get idempotency. In these cases, the caller module will send a unique ID to the processing module, which will keep track of every action it takes. If it gets a duplicate request with the same ID, the processing module throws away the message.

Example

Sending an email isn't an idempotent action, and you can't make it one. Instead, when a new email is sent, you can send a UniqueID for that particular email. Because it tracks the process using the Unique ID, the application sending the mail won't send it again if it has already sent it.

Idempotency and Deduplication: Pros and Cons

Idempotency is easier to set up and manage because nodes don't have to work together. Deduplication, on the other hand, needs all the nodes in your system to work together. This is because the original request and the retry request could end up in different nodes.

Since idempotency is built into the processing module, it can be used even if you don't have control over the caller module. But for DeDuplication, you need to be in charge of the caller module because it generates IDs for every request.

DEV Community

Exactly Once Processing

Multiple Deliveries of a Message

Real World Example

Difference between Delivery and Processing

How do we avoid this?

Idempotency

Example

Method 1

Method 2

Remove duplicates

Example

Idempotency and Deduplication: Pros and Cons

Top comments (0)

Read next

How to Become a Solana Developer: A Comprehensive Guide

JavaScript Object Destructuring

Setting Up Virtual environment in Python Projects with Conda - 1

Top 5 Popular Frameworks and Libraries for Go in 2024