Scaling HTTP request with responses

#backend #scalability

I'm trying to understand how to be able to scale requests which need.
We have a REST API backend hosted in AWS ECS instances, and of course we can scale-up horizontally when needed.
The question is how to deal with peaks, or an increase in requests while the scale-up is in progress without losing any requests.
I found many posts about using queues, but mainly for jobs that are fire & forget.
I assume handling scale for HTTP requests where a response to the client is required is a very common issue.
Examples include a user trying to get or update info about himself (GET /user or POST /user). These requests need to return information or sometimes an error code (e.g. data is invalid / DB data conflict occurred).
Of course we can use caching to reduce the processing time for a request, but still that doesn't eliminate the ability to handle a sudden increase in requests.
Using queues for this case means that the backend should queue the request and then after the request is handled, needs to wait for it to be processed and return a result / error code to the blocking client.
Meaning handling these types of requests asynchonously.
While technically it may be possible, this complicates the implementation, and I'm wondering if I'm missing anything simpler.

What is the best practice for this?

DEV Community

Scaling HTTP request with responses

Top comments (0)

Read next

How to Reduce Costs in Magento Development without Compromising Quality

FFMPEG Libraries - RTSP Client Keep Alive

Mastering Payment App Testing: Key Features & Best Practices for Seamless Transactions

Design Pattern - Factory Method