When working on some of the enterprise architecture for the business such as booking system, ordering, payment system, etc. with high workload (massive concurrency users), we normally got some of the side effects in which it makes one request will be duplicated multiple times. We aren't aware of this issue in the normal workload.
Lets see what happen with the request API like this
Nothing is really special here, right? We write down the code for booking the ticket in Booking free seats for the movie which we do as usual on a daily basis.
But when we run it in the big workload with massive concurrency users, then you can see
If you notice it well, then you will see some of the duplicated requests there. That's because, at some point in time, the concurrency users tried to call and book the ticket. User with id="a000711d-e6b9-4c6c-b4d6-d0b726103847" tried to book 2 tickets (seat number 1 and 2) in 3 times. But as per the logic of this application, we only want this user can book 1 time for each seat number. That makes your application run wrong, and make unreliable. And technically, it is making the application hard to reason about and nondeterministic. 😌😌😌
And sometimes, you might get some of errors like
This kind of error don't happen in Rustlang due to the request cannot change the items in the collection on the runtime :( It makes our programming model not safe, we need to very care about it (manual check to avoid it, but it maybe hard to avoid sometimes). 🥺🥺🥺
So the question is how can we fix it? 👨💻
If you still remember, we can lock this request and let it process it completely by using Mutex or Semaphore mechanism, then we can solve that. Let's try it!
Then run it again, then we can see the problem is gone. Cool 🥳
New problems: scaling out current service
But it's not yet solved the problem radically. Presumably that we want to satisfy as many users as possible when our business bigger and bigger. What can we do now? We put our application into containers like Docker technology and let it scale out when we deploy into some kind of orchestrator like Kubernetes. I think that is a possible way to do it nowadays.
Now your application runs on N-instances of the stateless application. Look at this
After designed and implemented it in code, we instantly recognized that there are 2 critical problems here:
- Lock request with Semaphore on the code above will not work correctly anymore due to we have multiple instances of booking service (PODs: scale-out).
- Database side is might bottleneck because of increased horizontally PODs and request users.
We will not talk about the second problem in this post, and what we focus on is the first problem.
Solve the distributed lock problems
To solve the first problem, we will use some of the distributed lock mechanisms out there such as acquire the lock from distributed cache, using etcd to lock resource or Zookeeper lock, or to acquire the lock from database. All of that will make your application just only serve one request at a time.
But now we talk about the drawbacks of these mechanism, when I had the chance to use Redis Redlock mechanism. I observed that RAM consuming in this kind of mechanism is quite high and keep all the time (if it increases then it keeps in that level until you restart the POD).
And we might not want our application consume and keep RAM a lot even RAM is cheaper now, but with some of business that is also a huge problem. 😱😱😱
Actor model solution
But if we don't use that those of mechanisms, then what's next?
What if we don't need to lock the resource and queue up each request by the time? I read the Reactive manifesto and somehow I figure out we need to re-research the lock-free mechanism (I read about Actor Model and CRDTs in the fast but at that time, I didn't actually imagine that one day I need to re-visit these concepts and mechanisms, LoL). I continually investigated and found out Actor model. What will solve the first problem, we talked about so far. 😇
Now let implement the booking business code with actor model (Dapr with actor model on .NET 5)
And the details
We don't need to acquire a lock anymore due to the mechanism of Actor (there is a virtual actor model and placement mechanism to maintain only one Actor for booking service and you might create thousands of other actors if you want). 🤩🤩🤩
We run the stress test scenario again
That's exactly what we want for our application. One user can only book one ticket at a time. 🥳🥳🥳🥳🥳
What the subtle of this code is we don't need to care of lock anymore. The actor has a mailbox mechanism to only let one request per actor execution (single thread). That's mean if you want to scale out to thousands of request from users, and you want to utilize the multiple CPU cores then you need to have a strategy to design your domain model.
It might have one Booking Processor to keep the list of booking slots by users, and pubsub to another stateless service, and in that service will create a thousand other actors to process the request for reserve slots, send an email, etc... 👨💻 And the second problem with database scaling is also need to investigate more.
That's all for today. 😎
All the source code can be found at https://github.com/thangchung/dapr-actors-experiment 💥💥💥
Top comments (1)
Hi, Thank you for the article. I am sure Actor model solves concurrency problems in distributed microservices apps. However, your article falls short of showing that. Could we have a better way of demonstrating the the trouble with the mutex verses dapr actor model. Perhaps by creating many instances of the api service for both cases.