Skip to content
loading...

Why a distributed system (highly concurrent) is mostly asynchronous?

ananto30 profile image Azizul Haque Ananto github logo ・1 min read  

The actual problem in synchronous systems is the waiting time between the client and server where both have to wait for the process to complete. In a system where millions of requests are made per second, they usually take the request and process it asynchronously with messaging systems like Kafka or any MQ's. A gateway client may wait for the response event to respond to the user, but the actual process happens in async (maybe) in the internally distributed systems. So there are two questions I would like to ask -
1) Is my concept correct about high concurrent systems?
2) Is there any way we can make this kind of system synchronous? (internally)

twitter logo DISCUSS (9)
Discussion
markdown guide
 

To answer question in the topic: synchronous version of the system capable to handle big loads many times more resource consuming than asynchronous versions of the same system.

1) Asynchronous systems have nothing to do with Kafka or MQ. These components might be present when necessary, but presence does not mean asynchronous processing. As well as lack of them does not mean synchronous processing. And there is no need to make "gateway client" to wait for anything. Request which waits for response from the other part of the system is just stored in memory in some way (there are different approaches for this).
2) Almost any system can be implemented as synchronous. There are just no meaningful reasons to do that. From the other hand, whole system can be designed as a set of services with synchronous communication. Traditional REST-based microservices enforcing exactly this model, despite high overhead and latency.

P.S. you can take a look at my blog, asynchronous processing is one of areas of my interests

 

Thanks for the answers. Can you please give examples of - Request which waits for response from the other part of the system is just stored in memory in some way (there are different approaches for this)
Is it mapping you are talking about? What are the other ways can we make asynchronous to synchronous?

If it's not about Kafka or MQ what other ways async systems can talk to each other? Shouldn't there be a message passing system?

 

No, message passing system is not necessary to create asynchronous processing system.

Whole explanation is somewhat larger than it is convenient to put into comment, but I'll try.

Behind most successful asynchronous processing libraries/frameworks (for example Node.js and Vert.x) you can find Reactor pattern. At high level the idea of reactor pattern is quite simple: there is a main thread which receives incoming events and then dispatches them for processing to other worker threads.

There are many articles dedicated to Reactor pattern, but this probably best one to understand how it works in Java at low level. The Netty is very similar but of course much more advanced and tuned to squeeze every single bit of performance.

So, what happens next and where asynchrony comes from?
The events coming from the OS to Reactor are quite low level, like "packet is received by socket XXX" or "packet is sent by socket YYY". These events are passed to chain of handlers which transform them higher level events, like "HTTP request is received" or "Requested data delivered from DB". Events of this level are passed to user code which is just another level of event handlers. The only requirement for the handler is that it does not block (i.e. it does not wait for finish of synchronous request). In most cases user handler can't, for example, generate response for HTTP request without calling some other services. Lets assume that these services are also asynchronous. In this case user code which handles incoming HTTP request invokes external service and immediately returns without completing HTTP request. Instead it passes some form of callback mechanism to invoked service. When invoked service finally receives necessary data, it calls provided callback and in callback we complete HTTP request and prepare response. While request is not completed it is stored in memory. Whole system has very small number of context switches. Another advantage is that such an approach can work on very low number of threads - usually number of threads is close to number of physical/logical CPU cores. Since every thread consumes sensible amount of memory and spends some resources on scheduling, reducing number of threads also significantly improves overall performance.

I really hope this very brief explanation is helpful. If it is not clear enough - let me know, I'll try to add necessary details.

Wow! Thanks! That's a very good explanation. I've been working with Webflux (Reactive Java based framework) for the last 6 months. It's an amazing concept but the performance boost is not up to the mark. Also, the code gets uglier day by day. (The common callback problem)
Lately, I've been working with Python's concurrency. The Asyncio event loop is the kind of reactor thing. But I've found it gets exhausted with a significant number of Futures. So I tried threads with multiprocessing and noticed there should be some message passing among the processes. I am working on a project on this concept with Zeromq. I will soon share the repo with you (hopefully, if I can finish).

I'm also working on the library which implements my views on how Java asynchronous processing should look like. I've taken Promise-based approach because Promises are composable and allow building complex processing pipelines without much efforts. Unlike Reactive streams (Webflux, Project Reactor, RxJava) Promises do not introduce any artificial concepts like "stream of one element" and whole processing looks very natural and similar to traditional synchronous code. I've also combined Promises with FP-style error and null values handling. Resulting combination is extremely powerful and expressive. You can take a look at code examples here and especially here.

 

Correct me if I'm wrong, but I don't think your question is entirely correct. Concurrency simply means that a process can be split up into tiny calculations that can then be run in any orders without affecting the final result. So I suppose that the internal processes could be considered asynchronous. In the end it all depends on if you specifically call the entire process synchronously or asynchronously. So concurrency as a whole has nothing to do with being synchronous or asynchronous.

So the answer to your second question is yes, you can make this synchronous as a whole, but the internal calculations cannot be synchronous by definition. Otherwise it wouldn't be concurrent any more.

 

Sorry, I was meant to say concurrent network calls. Yes, the question seems a little vague in the first look but I think you got the idea. Can you suggest me to rewrite it properly?

And actually, by the internal process, I meant microservices. I.e. think of an e-commerce system, where the internal services can be order service, product service, auth service and the client-facing website which is served by a gateway. Should the connection among them be always asynchronous or can they be synchronous, which in common sense will take more time in the gateway and also will cause to lose concurrency, isn't it?

 

I think you mean "distributed" instead of "highly concurrent".

The problem with making microservices highly coupled or even synchronous is that you end up with they call a distributed monolith, a system made by apps that are all dependent on each other to work, so instead of calling Object.function() locally and waiting for the result, you end up calling Class.remote_function() which sends a network call to another service and waits for the result.

The goal of microservices is to have separate and independent apps that can be scaled and deployed separately, without having to deploy other N services.

The same goes if they share a datastore (which might mean that a service can potentially break other services or services have to wait for another service to finish writing in the store).

So, if you shouldn't have them call each other directly and you shouldn't have them share a datastore, how do you pass messages between services? After all, from an abstract point of view, there's no difference between "Object.function()" and "send message from A to node B". It's all message passing. You do it asynchronously.

Asynchronicity though it's not a prerogative of a distributed set of services, you can do it with a regular monolotic application as well.

Nice! Thanks. I like your hypothesis about microservices, that match with mine. Yes, services should be decoupled as much as possible. But here comes another concept of 'single point of failure'.

Say in the e-commerce app I've described above, Order service is fully independent, can take orders and give the status of an order. Maybe the system is unable to take orders, but we need to show order status at least. How would we design that? Make two services? Then there's the store is getting shared among them.

There's always debate about microservices. But I really want to know the debates/ideas, it's important to choose how I am going to solve my problem based on other's experience in such similar cases. What would be your workaround at the current scenario? Just a primitive thought.

Classic DEV Post from Jan 21 '19

Resources for beginner data analysts

Everything you need to get started with SQL and becoming a great data analyst

Azizul Haque Ananto profile image
Software Engineer | Polyglot Programmer