Centralized logging in the Otel era

Centralized logging (paradoxically) is a need in the distributed systems world. How would one possibly go through doing that?
This is a post on some approaches that we tried using Opentelemetry collector and an AppDynamics (for logging data store and visualization)

1. PULL model - Log aggregation through agents

  • Made popular by the Java way of doing things
  • Suitable for webserver/daemon kind of long running systems
  • Agent needs permissions, could create READ locks, consume resources and could be a network hog
  • Agents can process and filter the logs before sending, there by reducing the load on the collector

2. PUSH model

  • Logs are shipped out by the applications using configurable SDKs and libraries
  • Decoupled in nature
  • Suitable for transient, run-anywhere, script kind of systems
  • Intermediary will be a resource hog

We proceeded with the PUSH model using Kafka as the intermediary due to it's simplicity and reliability

First cut

Tried with in-house python-kafka-log-handler, promtail for agent, loki for processor

  • Didn't work as Loki could not EXPORT out in Otel format

Second try

Directly PUSH logs from Loki to AppDynamics backend

  • Didn't work as Loki and AppD won't talk the same language

One more try - Otel collector to the rescue

Avoid all intermediaries - Otel collector supports Kafka as a receiver and OtelHttp as an exporter to push to AppD backend (which is fully Otel compliant)

  • This worked like a charm!

Foot notes:

  • opentelemtry has two flavors of collectors - opentelemetry-collector the minimum stable and opentelemetry-collector-contrib which supports a ton of receivers and exporters
  • opentelemetry-collector-contrib has loki support for both receiver and transporter; but we could not figure out a simpler way of pushing logs from applications to loki(processor/aggregator)

