DEV Community

Cover image for OpenTelemetry Metrics meets Azure
Nils Diekmann
Nils Diekmann

Posted on • Originally published at Medium

OpenTelemetry Metrics meets Azure

Tutorial featuring Azure Monitor OpenTelemetry Distro for C# and Grafana

Motivation

OpenTelemetry is the future of observability. It is being pushed by Google, Microsoft and Amazon to provide a vendor-neutral solution for traces, metrics and logs. The OpenTelemetry API implementation is available for many programming languages. I will only use metrics and the implementation for the language C# for this tutorial. Let's see how far apart reality and desire are.

I will use the Azure Monitor OpenTelemetry Distro to publish the metrics to the Azure cloud. The way the metrics are collected in the application is independent of this decision. It affects the way how the data is processed by the cloud components. Using the OpenTelemetry Collector as an alternative is also not officially supported by Azure.

Finally, I want to aggregate, transform, and visualize the raw metrics. I will use Log Analytics, Azure's default log analysis solution. Additionally, I will also use Grafana to create a beautiful dashboard. Grafana is available as a managed service on Azure.

OPS side - Infrastructure

The application will push the metrics from the running application to Application Insights. It will use the C# SDK for the connection to Azure Monitor. The data will be stored in the table 'customMetrics' of the Logs database. Log Analytics can query that table. Grafana queries the data from Log Analytics and displays it.

Azure Components needed

DEV side - Application

My goal is to provide the smallest possible sample to get you started with OpenTelemetry Metrics and Azure. The full source code is also available on GitHub.

GitHub logo KinNeko-De / sample-opentelemetry-azure

Sample application how to publish OpenTelemetry metrics to Azure

Motiviation

Sample application how to publish OpenTelemetry metrics to Azure

See my article how to create this by yourself.




Metrics in C# are instrumented with classes from the System.Diagnostics.Metrics namespace. The namespace provides an abstraction to separate the instrumentation from the way the metrics are later collected. OpenTelemetry uses a generic data model to define these metrics. For this example, I will use a counter as the simplest type. The metric will track the number of items ordered as an integer counter that only ever goes up.

private static readonly Meter Meter = new("restaurant-order-svc", "0.0.1");
private static readonly Counter<int> ItemsOrderedTotal = 
      Meter.CreateCounter<int>(
         "restaurant-order-svc-items-ordered",
         "item",
         "Items ordered total"
      );
Enter fullscreen mode Exit fullscreen mode

I define my metrics in a specific class called 'Metric'. It is injected as a singleton. The counter is declared as a static field. I prefer not to access the field directly, but to provide a wrapper method instead. With this approach, the class encapsulates how business data is transferred to generic metric data types. Using another layer of abstraction for a simple counter seems like overkill. The advantage becomes more apparent when I use histograms with buckets.

public void ItemsOrdered(int amount)
    {
        ItemsOrderedTotal.Add(amount);
    }
Enter fullscreen mode Exit fullscreen mode

The metric is simulated with an infinite background service. It sends a random number of ordered items at each interval. In a real application, I would instead increase the metric after the order has been successfully stored in the database.

var itemsOrdered = RandomNumberGenerator.GetInt32(1, 9);
Metric.ItemsOrdered(itemsOrdered);
Enter fullscreen mode Exit fullscreen mode

To publish the collected data, I have to configure the application to use OpenTelemetry. I do this within my Program.cs with the service collection extension AddOpenTelemetry(). The extension is available via the NuGet package OpenTelemetry.Extensions.Hosting.

private static void ConfigureServices(IServiceCollection services)
{
    services.AddOpenTelemetry();
}
Enter fullscreen mode Exit fullscreen mode

To publish the data to Azure, I use the service collection extension UseAzureMonitor() service collection extension. This extension is available via the NuGet package Azure.Monitor.OpenTelemetry.AspNetCore. This package has a dependency on OpenTelemetry.Extensions.Hosting so I only need this package as a dependency in my application. Within the AzureMonitor configuration, I need to use the ConnectionString to my Application Insights resource.

private static void ConfigureServices(IServiceCollection services)
{
    var applicationInsightsConnectionString = 
        "InstrumentationKey=00000000-0000-0000-0000-000000000000";
    services.AddOpenTelemetry().UseAzureMonitor(b =>
        {
             b.ConnectionString = applicationInsightsConnectionString;
        })
        .WithMetrics(metricBuilder => metricBuilder
           .AddMeter(Operations.Metric.ApplicationName)
           .AddConsoleExporter(builder => builder.Targets = 
               ConsoleExporterOutputTargets.Console)
         );
}
Enter fullscreen mode Exit fullscreen mode

I also added an OpenTelemetry console exporter to see all the data being generated. It takes some time for the data to appear in Application Insights. Metrics can generally be lost if the application crashes. If you are running the application and you want to be sure that the data is pushed, you should shut down the application gracefully. Locally, you can do this by pressing CTRL+C in the application console.

Configure Azure Infrastructure

Firstly, I need to create an Azure Monitor workspace. I am not sure, if I need it for publishing the metrics. As far as I understood the documentation, it is needed for Grafana to access the stored metrics.

I have also created an Application Insights resource. I need the connection string for the C# SDK. When I create the Application Insights, an associated Log Analytics Workspace is automatically generated.

Creating a new application Insights

I had the problem that most of the time I could not query any data through this generated Log Analytics Workspace. The effect was quite random and I was totally confused. It seemed to be solved by creating a new Log Analytics namespace and linking it to the Applications Insights resource. If you encounter the same problem, please leave a comment.

Finally, I created an Azure Managed Grafana resource. I change the pricing tier from 'Standard' to "Essential". This pricing tier is suggested for my use case, but it also has some limitations and a pitfall that I will describe later.

Visualize with Log Analytics

The first challenge I have to overcome is querying the raw data. To do this, I need to go to Application Insight. In the 'Monitor' section, I select the 'Logs' item. The most obvious choice would be 'Metrics', but I cannot find the OpenTelemetry metrics here.

Monitor section

There are several tables within the Logs section. The OpenTelemetry metrics are stored in the 'customMetrics' table. I can now query the data using the Kusto Query Language (KQL).

customMetrics 
| where name == 'restaurant-order-svc-items-ordered'
| order by timestamp asc
Enter fullscreen mode Exit fullscreen mode

Each metric must have a unique name. In this first query, I filter my metric by name and then sort by timestamp. The result of this query is a table where each row is a reported counter value. Because I only increment the value every 15 minutes, the column value is exactly the number of one order.

Query raw data using KQL

It is also possible to create a time chart from the raw data. I simply add a render command to the KQL query. There are various types of charts available to visualize the data.

customMetrics 
| where name == 'restaurant-order-svc-items-ordered'
| order by timestamp asc
| render timechart
Enter fullscreen mode Exit fullscreen mode

The result is a time chart that looks very nice. The reason for this is not my query itself, but rather the fact that the application has published exactly one data point every 15 minutes.

Time chart with every datapoint

To visualize the data, I need to aggregate the values over a specific interval. In this query, I choose a fixed interval of one hour. The timestamp of each data point is rounded to this interval. The values for all data points are summarized to one value per hour.

customMetrics
| where name == 'restaurant-order-svc-items-ordered'
| summarize sum(value) by bin(timestamp, 1h)
| order by timestamp asc
| render timechart
Enter fullscreen mode Exit fullscreen mode

The result is a graph showing the total number of items ordered each hour.

Items ordered per hour

Visualize with Grafana

Once I have created the Grafana resource, I can already enter it and create new dashboards. Within the Azure Portal, I can find the URL to access the Grafana resource.

Grafana URL

In order to give Grafana access, I need to link Grafana to the Azure Monitor Workspace. It looked like I could also set the link in Azure Managed Grafana unter ‘Azure Monitor workspaces’, but here I got an error message.

Linking Grafana to Azure Monitor Workspace

After connecting the Grafana to Azure Monitor, I have a new data source ‘Azure Monitor’ in Grafana. This data source is already configured correctly. Azure also generated a lot of additional dashboards.

generated datasource ‘’Azure Monitor”

Grafana was created with the ‘Essential’ pricing tear because it costs less. The pricing tier has a limit on the number of dashboards, I can only have 20. With the newly created dashboards, I exceed this limit. Whenever I try to save a new dashboard, I get an error message that my quota has been exceeded. I can now change the pricing tier to ‘Standard’, but I can never go back to ‘Essential’. I have fallen into this trap. A much cheaper solution would be to delete all the other dashboards.

First steps: Datasource ‘Azure Monitor’

I am now ready to start my first visualization. In the ‘Query’ tab, I first select the correct data source to retrieve data from the ‘customMetric’ table. The database source is ‘Azure Monitor’ (1st in the previous picture). I need to change the Service value to ‘Logs’. The value describes the database I want to query from (2nd in the previous picture). I then select as the Resource the Application Insight resource (3rd in the previous picture) on which I previously ran my KQL queries (see the following picture).

Second steps: Select Resource Application Insights resource

I want to select the number of items ordered. To do this, I simply summarize all the values. This is not the total number of items ever ordered, as metrics data is usually purged after some time. It just gives me an idea of the most recent orders and that the application is running smoothly. If the number of orders is lower than expected, it may be an indication that something is wrong.

customMetrics
| where name == 'restaurant-order-svc-items-ordered'
| summarize by timestamp, value
| order by timestamp asc
Enter fullscreen mode Exit fullscreen mode

Summarising by timestamp and value is required in Grafana, otherwise, every single entry in the table will be displayed as a single point in Grafana. Ordering by timestamp is not logically necessary, but the ‘Stat’ graph type I chose displays a nice background image of the individual data points.

Items ordered in total (Stat)

My second chart should show the aggregated number of orders in a given interval. To do this, I first declare an interval variable for the dashboard with an enumeration of values.

Custom defined variable

The user can select the interval in the frontend. The selected value is then passed to the query as a parameter. The query then aggregates the timestamp according to the interval and then summarises all the values.

customMetrics
| where name == 'restaurant-order-svc-items-ordered'
| summarize sum(value) by bin(timestamp, $interval)
| order by timestamp asc
Enter fullscreen mode Exit fullscreen mode

The graph I have chosen to display the data is ‘Time series’. In the top left hand corner of the following image, you can see the selectable interval. The legend of this diagram is not aligned with the selected interval. There are 30 minutes between the legend values, but one hour between the data points. I can change this later.

Items ordered per hour (time series with interval 1h)

The variable $interval is defined in Grafana and from the result of the diagram, I see that it works, but KQL’s syntax highlighting does not recognize the variable. It appears as an unknown variable. I did a bit of googling and I think I am not alone with this and I should ignore it for now.

KQL does not detect the variables defined in Grafana

Conclusion

OpenTelemetry is easy to use. I already have experience with metrics and OpenTracing. I am also a bit biased because I really like the concept of metrics. Do you find it easy to use? Please leave a comment below. I really want to know your opinion.

Integrating OpenTelemetry with Azure is fairly straightforward if you know what you need to do. However, what components you need and how they communicate with each other is not very clearly documented. For example, the documentation tells you to use “Azure Monitor”, but then the SDK needs a connection string to Application Insight and not to the Azure Monitor workspace. This, and the fact that there are metrics that are different from ‘customMetrics’, is confusing.

It is time-consuming to put all the pieces together for the first time. The problem with the non-queryable results in Application Insights also took me several hours. When things went wrong, I could not find detailed logs to figure out which component was causing the problem. I had previously used an OpenTelemetry Exporter on my local host, which at least gave me a log of the data received.

The next pain point is the additional cost of the Azure components. If you already have a Prometheus for your Kubernetes cluster, why not store the metrics there? I guess the Azure Monitor OpenTelemetry Distro is aimed at people who don’t have Prometheus and just want to use observability for Azure Container Apps. I think for them the Distro gives an easy first impression of the subject matter. And to make it even easier, I have created this tutorial for you.

Follow me to read more about OpenTelemetry metrics meets Google Cloud later. More claps will motivate me to write it ❤

Top comments (0)