DEV Community

Cover image for Open-telemetry collector: the powerful recipe for observability pipelines
Ashok Nagaraj
Ashok Nagaraj

Posted on

Open-telemetry collector: the powerful recipe for observability pipelines

OTel collector is a component used to receive, process and export telemetry data (signals) from sources to observability backends like elasticsearch, cassandra, Datadog, NewRelic ...

Image description

Advantages

  1. Avoids resource contentions due to scalable nature and variety of deployment modes
  2. Customizable at multiple levels, no need for constant reboot/reload of the pipeline
  3. Tolerant to network partitions to the most part

Use a collector when you

  • need a common ingestion point for variety of signals like metrics and traces
  • need to collect signals from multiple sources like application, infra, cluster, framework, databases
  • apply transformation to the signal data before storing it in the backend
  • enrich signals with additional meta-data
  • filter-out signals based on various predefined criteria
  • send signal data to multiple observability backends
  • build a loosely coupled, scalable pipeline for signal data flow

Installation

  1. Create a configuration file, say config.yaml; details here
  2. Run the collector
$ docker pull otel/opentelemetry-collector:latest
$ docker run [-d] -v $(pwd)/config.yaml:/etc/otelcol/config.yaml [port-config] otel/opentelemetry-collector:latest
Enter fullscreen mode Exit fullscreen mode
# port configuration
      - "1888:1888"   # pprof extension
      - "8888:8888"   # Prometheus metrics exposed by the collector
      - "8889:8889"   # Prometheus exporter metrics
      - "13133:13133" # health_check extension
      - "4317:4317"   # OTLP gRPC receiver
      - "4318:4318"   # OTLP http receiver
      - "55679:55679" # zpages extension
Enter fullscreen mode Exit fullscreen mode

Configuration

Has 3 (or 4) components all of which needs to be enabled in service section
Note: Atleast one service > pipeline is mandatory

  1. Receivers - describe modes of how collector gets the data IN and can be PUSH or PULL based (eg: host metrics, application metrics, zipkin traces).
  2. Processors - run on the data being transported and optionally massage, transform and filter-out data (eg: filter, batch, samplers)
  3. Exporters - specify how data is sent out to one/more configured backends, can be PUSH or PULL based (eg: file, jaeger, prometheus). They generally involve details of authentication in production environments.
  4. Extensions(optional) - provide additional capabilities to the collector, but not requiring direct access to signal data (eg: health_check, pprof)

More info: configuration wiki

Deployment modes
  1. Agent: A collector instance running on the same node as the application (binary, side-car or daemonset)
  2. Gateway: One or more instances collectively running centrally as a standalone service. It can often offer advanced capabilities like simple load-balancing, tail based sampling, independent scaling .. generally acting as a receiver for the agents.

Demo

Gist to working demo
Note: metrics exporter does not seem to work based on official documentation

OTel Deployment patterns

Source of all the below information CNCF presentation

Basic - instrument and send to a collector
Used when application is instrumented with OTel SDK and signals are sent to a predefined collector

Image description

Basic - fanout
Used when signals are (processed and) sent to multiple destinations. Useful in situations where multiple views/perspectives of same data is to be generated (eg: one from Jaeger one from Datadog)

Image description

Normalizer
Collector works as an intermediate proxy and massages the data before passing on to destination; used when common processors are to be applied on the incoming signals

Image description

Kubernetes sidecar
Workloads send signals to a OTel collector sidecar which is sent over to a collector residing in a central namespace (which processes and sends it to destination). Advantages of this pattern is decoupled central collector, easily customizable side-car and implicit load balancing.

Image description

Kubernetes daemonset
Collector is deployed as a daemonset; while it eases management, multi-tenancy and scaling requirements are hard to customize.

Image description

Loadbalanced collector
A central load-balancing collector is used to aggregate all signals from a given source to a given backend collector (like how session affinity is handled). The idea behind the implementation is that any given collector should provide full picture of the source application independently.

Image description

Multicluster
A common Otel collector is deployed on a central cluster which acts as the final stop before writing to destinations. It is useful in regulatory scenarios where common point of control need be established

Image description

Multi-tenant
Multiple destinations generally are involved and Otel collector processes and sends to multiple destinations based on filtering tags

Image description

Per signal
An otel collector per signal type (eg: one for metrics, one for traces ..). Useful to establish saperate observability pipelines per signal. Note: A PUSH based collector can be scaled easily while PULL based (prometheus) is not straight-forward given the idempotency semantics.

Image description

Top comments (0)