Yoav Danieli for Aspecto

Posted on Nov 15, 2022 • Originally published at aspecto.io

OpenTelemetry Operator for Kubernetes: Practical Guide | Part 4

In this article, you will learn how to use the OpenTelemetry Operator. We will explore what a Kubernetes Operator is, how to use it, and tackle common issues you might encounter when setting up the Opentelemetry Operator in your cluster.

This is the 4th part of our OpenTelemetry on Kubernetes series.

In the previous articles, I explained what the Opentelemetry Collector is, how to configure it, and how to set up a local Kubernetes cluster using Minikube and deploy the collector to that cluster. In addition, I described the deployment methods for a collector and how to use them (Collector as a Gateway and a Sidecar agent). Lastly, I showed for any deployment how to examine traces generated by our programs and visualize them using Aspecto.

Good to know before reading this article:

OpenTelemetry Collector on Kubernetes Basics | Part 1
OpenTelemetry Collector as an Agent on Kubernetes | Part 2
Opentelemetry Collector on Kubernetes Using Helm | Part 3

What is a Kubernetes Operator?

An operator extends Kubernetes to manage applications and their components using custom resources. Custom resources allow you to define new resources that group API objects with specific settings.

For example, we can create an application custom resource that accepts an image of a web server application and a port. This resource is composed of a deployment object and a service object that exposes this port.

To manage resources, Kubernetes uses the controller pattern. Notably the control loop. It means that Kubernetes runtime is always watching the current state of the resources in a cluster and comparing them to the desired state. Then it tries to reconcile to bridge the gap between the states if one exists.

Operators are custom controllers and custom resources put together. It is software running in your cluster, the same as the Kubernetes runtime, and watches the custom resources and their state. Operators allow complex clusters and systems to run automatically.

In short, Operators extend the cluster's behavior without modifying Kubernetes code.

What is OpenTelemetry Operator?

The OpenTelemetry Operator is an operator written by the OpenTelemetry community and aims to manage and simplify the way we incorporate observability into our system. It aims to solve the challenges of any developer that wants to add observability to their cluster encounters.

The first challenge is the necessity to configure and manage OpenTelemetry Collectors that answer the specific requirements of your system. The second challenge is instrumenting your core business logic and generating telemetry data to observe.

The OpenTelemetry Operator introduces two CRDs as a solution to these challenges:

OpentelemetryCollector -- This resource simplifies the configuration and maintenance of a Collector.
Instrumentation -- This resource can instrument your application for you. No change to your code is required.

In the remainder of this article, I will demonstrate how to use the Operator to build a working and instrumented web server.

OpenTelemetry Operator: The Practical Part

Setting up our application

First, we will create our application and deploy it to a local development cluster.

The application code is short. We are creating a simple express server that listens on port 3000 and returns a hello world message. The code is written in typescript.

Let's create tsconfig.ts, package.json, and a Dockerfile to build the image.

Here you will find the source code.

// application/index.ts

import express, { Request, Response } from 'express';

const PORT = process.env.PORT || 3000;

const app = express();
app.get('/', (req: Request, res: Response) => {
res.send('Hello World!');
});

app.listen(PORT, () => {
console.log(`Running on ${PORT}`);
});

Use the following Docker file to build the application:

FROM node:16
WORKDIR /usr/src/app
COPY package*.json ./
RUN npm install
COPY . .
RUN npm run build
EXPOSE 3000
CMD [ "node", "index.js" ]

Now run:

docker build -t example-express-server ./application

Now it's time to set up our cluster. To set up a development environment please follow part 1 of this series guide.

Let's create a new cluster. We will clean up everything we did so far and start fresh:

minikube delete --purgeminikube start

Create a new file to define the resources for our application:

touch application.yml

And a Kubernetes namespace to isolate our application logic:

kubectl create namespace application

And a deployment resource and a service:

apiVersion: v1
kind: Service
metadata:
name: express-server
namespace: application
labels:
  app: application
  component: express-server
spec:
ports:
  - name: express # Default endpoint for OpenTelemetry gRPC receiver.
    port: 3000
    protocol: TCP
    targetPort: 3000
selector:
  component: express-server
type: LoadBalancer
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: express-server
namespace: application
labels:
  app: application
  component: express-server
spec:
selector:
  matchLabels:
    app: application
    component: express-server
replicas: 1
template:
  metadata:
    labels:
      app: application
      component: express-serve
  spec:
    containers:
      - name: express-server
        ports:
          - containerPort: 3000
        image: example-express-server
        imagePullPolicy: Never
        resources:
          limits:
            memory: '128 Mi'
            cpu: '500m'

We created a service of type Load balancer to expose the application to the outside of the cluster.

Note that we are using a locally built image, so we need to configure minikube not to look for this image elsewhere. Read about this issue here.

eval $(minikube docker-env)

Create a minikube tunnel to expose our cluster to our machine:

# run in a different shell and keep it open
> minikube tunnel

Now let's apply the changes and call the server to see that it is working:

> kubectl apply -f application.yml
> curl localhost:3000
Hello World!%

Collector Gateway with OpenTelemetry Operator

The server is up and running. Now it's time to design and extend our system by adding observability.

You can read more about deployment strategies in previous articles. For now, our setup will be as follows:

OpenTelemetry Collector gateway deployed as a Kubernetes deployment. This Collector will receive telemetry data on port 4317 and exports the data to a local Jaeger instance.
A Jaeger 'all-in-one' instance to observe telemetry data (traces)
The application we defined above will run with a collector agent as a sidecar. In addition, we will use the OpenTelemetry Operator Instrumentation resource to instrument the application without changing its code.

Create a namespace for our OpenTelemetry resources:

kubectl create namespace opentelemetry

Let's use the following manifest to deploy a Jaeger instance and network services:

# jaeger.yml

apiVersion: v1
kind: Service
metadata:
name: jaeger-all-in-one
namespace: opentelemetry
labels:
  app: opentelemetry
  component: otel-collector
spec:
ports:
  - name: collector
    port: 14250
    protocol: TCP
    targetPort: 14250
selector:
  component: otel-collector
---
apiVersion: v1
kind: Service
metadata:
name: jaeger-all-in-one-ui
namespace: opentelemetry
labels:
  app: opentelemetry
  component: otel-collector
spec:
ports:
  - name: jaeger
    port: 16686
    protocol: TCP
    targetPort: 16686
selector:
  component: otel-collector
type: LoadBalancer
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: jaeger-all-in-one
namespace: opentelemetry
labels:
  app: opentelemetry
  component: otel-collector
spec:
replicas: 1
selector:
  matchLabels:
    app: opentelemetry
    component: otel-collector
template:
  metadata:
    labels:
      app: opentelemetry
      component: otel-collector
  spec:
    containers:
      - image: jaegertracing/all-in-one:1.35
        name: jaeger
        ports:
          - containerPort: 16686
          - containerPort: 14268
          - containerPort: 14250

Apply the resources in the cluster:

> kubectl apply -f jaeger.yml

Visit http://localhost:16686 and check that the Jaeger UI is running and available.

Now we will use the OpenTelemetryCollector CRD to deploy a collector as a gateway.

First, we need to install the OpenTelemetry Operator. Follow the official documentation for more information.

> kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml

You will probably receive an error about needing a cert-manager installed. So Let's install that also:

> kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.10.0/cert-manager.yaml
> kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml

This installs the OpenTelemetry Operator control plane in your cluster under the namespace 'opentelemetry-operator-system'. You can check out all the extra software your cluster is now running with:

> kubectl get all -n opentelemetry-operator-system

Now that we have the Operator installed, we can use the custom resources it offers. Let's create a gateway.yml file with the following config:

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
name: otel-collector
namespace: opentelemetry
labels:
  app: opentelemetry
  component: otel-collector
spec:
mode: deployment
config: |
  receivers:
    otlp:
      protocols:
        grpc:
        http:
  exporters:
    jaeger:
      endpoint: jaeger-all-in-one:14250
      tls:
        insecure: true
    logging:
  processors:
    batch:
    resource:
      attributes:
        - key: test.key
          value: "test-value"
          action: insert
  extensions:
    health_check:
    zpages:
      endpoint: :55679
  service:
    telemetry:
      logs:
        level: "debug"
    extensions: [zpages, health_check]
    pipelines:
      traces:
        receivers: [otlp]
        processors: [batch, resource]
        exporters: [logging, jaeger]

Apply the changes:

> kubectl apply -f gateway.yml

This resource deploys the OpenTelemetry Collector with the defined config as a deployment. I will not get into the configuration file since we explained it in depth in previous articles. What is important to note here is the mode specification. The available values are:

Deployment
Sidecar
DeamonSet

In addition to the Collector deployment, this resource also handles running the necessary service to communicate with the collector. How simple is that? If you read my previous articles, you can see much larger manifest files. Here this is not the case. The Operator simplified all of this process for us.

Collector Agent with OpenTelemetry Operator

The next part will add the opentelemetry collector agent to run as a sidecar for our application. Create a new file for the sidecar:

# sidecar.yml

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
name: sidecar
namespace: application
spec:
mode: sidecar
config: |
  receivers:
    otlp:
      protocols:
        grpc:
        http:
  processors:
    batch:
  exporters:
    logging:
    otlp:
      endpoint: "http://otel-collector-collector.opentelemetry.svc.cluster.local:4317"
      tls:
        insecure: true
  service:
    telemetry:
      logs:
        level: "debug"
    pipelines:
      traces:
        receivers: [otlp]
        processors: []
        exporters: [logging, otlp]

As you can see, the mode is set to sidecar while the config is set to the agent config we defined in the previous article.

An important note that can cause issues is that we need to pass the telemetry data forward to the gateway collector. Now, Because they are running in different namespaces, we need to specify the full cluster DNS for the desired service.

Get the name of the Gateway service by typing:

> kubectl get svc -n opentelemetry

Choose the service that listens on port 4317.

The full DNS is composed as follows:

[service_name].[namespace].svc.cluster.[local | remote_url]:[port]

Let's wait with applying this (I will explain why after the next section). For now, continue and create the instrumentation for our application.

Auto Instrumentation with OpenTelemetry Operator

Create a new manifest file with the following resource:

# instrumentation.yml

apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
name: nodejs-instrumentation
namespace: application
spec:
propagators:
  - tracecontext
  - baggage
  - b3
sampler:
  type: always_on
nodejs:

We are using the Instrumentation CRD. It lets us define all kinds of settings regarding how we want to instrument our application. The most important ones are the propagator, sampler, and programming language (or runtime) specification.

We will not get into each of the configurations available by the instrumentation since this requires a separate article. If you are interested in reading more about it, visit the official docs.

The Propagators let our injected instrumentation SDK know how to propagate telemetry data. Let's use the propagators defined in the OpenTelemetry repo example.

The sampler specifies the sampling strategy to be executed by this instrumentation. Go here to read more about sampling.

Lastly, we define the Nodejs part of the instrumentation to use the default values. The other programming languages supported are Python, .Net, and Java.

OpenTelemetry Operator: Putting it all together

The last thing we need to do is update our application resource and add annotations. It is how the Operator knows to inject the sidecar and the instrumentation into the deployment resource.

Update application.yml and add the following annotation:

spec:
template:
  metadata:
    annotations:
      instrumentation.opentelemetry.io/inject-nodejs: 'nodejs-instrumentation'
      sidecar.opentelemetry.io/inject: 'sidecar'

Notice that the instrumentation only works if it runs before the application. It needs to be configured and applied in the cluster before the application.

Shut down the application:

> kubectl delete -f application.yml

And apply all of the above configurations:

> kubectl apply -f instrumentation.yml
> kubectl apply -f sidecar.yml
> kubectl apply -f application.yml

Let's see what's going on by examining the application:

> kubectl logs deployments.apps/express-server -n application
Defaulted container "express-server" out of: express-server, otc-container, opentelemetry-auto-instrumentation (init)

We can see two more containers run inside the deployment. The otc-container which is the sidecar, and the instrumentation runs as an init container. It means it runs just before the other containers and is used for running commands and prefixing them to other image commands.

Let's invoke the application endpoint one more time:

> curl localhost:3000
Hello World!%

We can see that we got back the hello world message. Check out http://localhost:16686 and verify that the telemetry data reached its final destination.

That is it! Using the OpenTelemetry Operator was simple, intuitive, and with much fewer issues than configuring each Kubernetes object by ourselves.

The most important advantage is the ability to instrument applications without changing their code.

It was a fascinating subject to write about, and I hope you enjoyed it as much as I did.

If you have any questions or comments, feel free to CONTACT me on LinkedIn. As my wife says, I'm wrong quite a lot, so if you see a mistake, please let me know.

See you next time.