In the previous post, I looked into telemetry detail and advanced alerting.
In this article, I optimize the alerts by two methods.
- Sampling: Control how much data to ingest to the Application Insight
- Suppression: Avoid false alert when I already know what's going on
It's ideal to collect every single log for detail information for analytics point of view. However, I need pay the price for both performance and service cost. I also know that sampling just works in many real-world scenario.
Application Insights provide three types of sampling for now.
- Adaptive sampling: This is enabled by default for .NET SDK and Azure Functions
- Fixed-rate sampling: Available for some SDK only
- Ingestion sampling: This is service side settings.
I won't explain the detail as you can read them in official documentation, but there are several tips from my point of view.
- Start by full collection, especially small-mid side application. .NET SDK use sampling by default, you have to explicitly disable it.
- Use SDK side of sampling if possible. This requires application deployment. If you don't want to touch the running application, you can use ingestion sampling.
The official document also provide good information when to use which types of sampling. When to use sampling
It's great to have alert rule, but I want to stop sending alert depending on my work. For example, If I already know I run some stress test in production environemnt which may exceed CPU threashold, or I know the environment does reboot due to patch maintenance.
To achieve this, I can use Action Rules
4. I can add filter to further filter down, but I didn't set this time.
The official document has more Example Scenarios.
I use suppression as the topic of this article is to optimize the alert. But if you want to optimize operations as well, you can use action groups to grouping actions into one so that you can manage them easier.
Usually more is better. At the same time, less is more. Good thing is that Application Insights/Azure Monitor provides choices for users.
In the next article, I look into "offline" scenario.