Is that really a monitoring problem?
Sounds like a infrastructure problem, which can be solved assisted by monitoring to give appropriate alerts to trigger adding more resources to help with peak loads.
It's more of a supply/demand problem, and when there's an issue on the supply side (target server), the demand will go up 3x.
A better solution is to cache the call to the target, and revalidate it in a separate thread.
One easy way to do this is to use a reverse proxy with stale-while-revalidate
Retry is one simple feature to handle an API downtime, but it's surely not the only thing that will help your app to stay up.
There's a great article on Dev.to on the main mechanisms to improve reslience on your app. At Bearer, we're starting with retry, but we are planning to add them all to our Agent.
Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink.
Hide child comments as well
Confirm
For further actions, you may consider blocking this person and/or reporting abuse
We're a place where coders share, stay up-to-date and grow their careers.
Is that really a monitoring problem?
Sounds like a infrastructure problem, which can be solved assisted by monitoring to give appropriate alerts to trigger adding more resources to help with peak loads.
It's more of a supply/demand problem, and when there's an issue on the supply side (target server), the demand will go up 3x.
A better solution is to cache the call to the target, and revalidate it in a separate thread.
One easy way to do this is to use a reverse proxy with stale-while-revalidate
Retry is one simple feature to handle an API downtime, but it's surely not the only thing that will help your app to stay up.
There's a great article on Dev.to on the main mechanisms to improve reslience on your app. At Bearer, we're starting with retry, but we are planning to add them all to our Agent.