It is significant for mission-critical application to run continuously, even if in case of any unplanned outages and potential errors. We know that Microsoft Azure guarantees high availability (99.9%) for Service Bus Queues and Topics to receive & send messages when it is properly configured.
Errors are bound to happen but due to the design of Azure systems, all of the issues tend to be short-lived. Nevertheless, many enterprises are still concerned to ensure if the Service Bus that handles their business-critical data are always up and running. If you are one among them, then this article is for you.
This article is indented to explain the aspects that Service Bus may go unavailable due to component failure, server failure or a faulty datacentre network switch rather than disasters like flood or earthquake where the data may be lost permanently.
In order to handle the failures beforehand, you must first understand what can cause you Azure Service Bus unavailable. Below are the common reasons;
- The queue may go Disabled / Send disabled / Receive disabled state
- Accidentally, the queue may be removed from the Service Bus namespace itself
- The subscription might be expired where the queue is present
- Throttling from an external system on which Service Bus depends
- The Quota on the Queue might get exceeded Now, let us, deep-dive, into the individual challenges and analyse the workarounds.
Now, let us, deep-dive, into the individual challenges and analyse the workarounds.
Whenever there is temporary unavailability or outage happens due to some reasons like server error, generally we see the entity become unavailable to applications we write in the following different ways:
-> ‘Send Disabled’ - sending messages to the queue is not possible
-> ‘Disabled’ - the queue will not be available for message send or receive operations
-> ‘Receive Disabled’ - receiving messages from the queue, other than peek lock, is not possible
This scenario is likely to happen in Enterprises where any of the team members may accidentally remove the Service Bus namespace itself. This could potentially affect the business if not noticed prior to the support or operations team.
The status of the queue will be “Unknown” and will not be available for any operations in the client applications.
This might happen due to the delay in renewing the subscription or disabling the subscription even when it is live, similar to the above scenario which could happen accidentally. This can potentially affect the active queue which is present in the particular subscription. Eventually, the queue will be detected to be in status ‘Unknown’.
If you are looking for a solution in order to fix the above-mentioned challenges under one roof, we got you back.
Serverless360 can monitor Azure Service Bus Queue state and notify on the expected state not being met. Threshold monitor can be configured to get notified on the above 3 scenarios.
The notification forwarded due to the unavailability of the queue will look similar to the above picture.
Moreover, if the outage is due to any temporary reasons, then the threshold monitor in Serverless360 can auto-correct the state of the queue to active. This will reduce the manual intervention of the support person and help fix the issue a lot faster.
Furthermore, you can set a number of retry attempts in order to auto-correct the expected state if the issue persists for a longer period.
Microsoft clearly states in its document that there are several thresholds which will affect the maximum throughput achieved before running into throttling conditions like no. of messages per transaction, message size of the queue, size of queue or topic etc. It is significant to ensure your entity is not being throttled.
When the queue already has messages that occupy its total size, sending any more messages to the queue is not possible. Any more attempts to send a message to the queue will result in User error.
Bingo, even the last two challenges can be fixed within the same roof – Serverless360.
To provide an out of the box solution, we have come up with another monitor called Data monitor which helps you to keep an eye on the Throttled Requests and user error metric, if fact on even more properties.
Real-time use case
If you are wondering why one should be concerned about the service bus availability given the Microsoft SLA, this real-time use case might help you to understand the significance.
Consider a Northwind company has a simple web application which pushes a message onto a service bus queue when a form is being filled.
The form approx. takes 5 minutes of user’s time to fill out and the company does want to ensure that the Service Bus is available when the user pushes the Submit button. As they are more concerned on the user’s time and don’t want to lose the business-critical message, they want the check done before the user fills in the form.
If in case they get notified on the service bus queue unavailability, they could simply redirect the user to an error page and hence save the user’s time and get the form filled later.
Here is where Serverless360 comes into the game and notifies the stakeholders on the unavailability of the Azure Service Bus through its extensive monitors. Also, tires to bring back the queue to the active state via its unique “AutoCorrect” feature.
Hope you got an understanding of the key things that you need to keep track off in order to ensure your Azure Service Bus availability. You can also use third party tooling like Serverless360 to seamlessly make sure that the business-critical Azure Serverless service (Service Bus) is up and running.