DEV Community

Cover image for DataDog vs Jaeger - key features, differences and alternatives
Ankit Anand ✨ for SigNoz

Posted on • Updated on • Originally published at signoz.io

DataDog vs Jaeger - key features, differences and alternatives

Both DataDog and Jaeger are tools used to monitor application performance. The difference lies in what they monitor and terms of usage. Jaeger is an open-source tool focused on distributed tracing of requests in a microservice architecture. While DataDog is a SaaS APM vendor covering most monitoring needs of an application.

SigNoz GitHub repo

Application performance monitoring is the process of keeping your app's health in check. APM tools enable you to be proactive about meeting the demands of your customers.

If you're comparing DataDog and Jaeger, distributed tracing capabilities of both tools is one of the important criterion. Before we dive in, let's first understand in brief what is distributed tracing.

What is distributed tracing?

In the world of microservices, a user request travels through hundreds of services before serving a user what they need. To make a business scalable, engineering teams are responsible for particular services with no insight into how the system performs as a whole. And that's where distributed tracing comes into the picture.

Microservices architecture
Microservice architecture of a fictional e-commerce application

Distributed tracing gives you insight into how a particular service is performing as part of the whole in a distributed software system. There are two essential concepts involved in distributed tracing: Spans and trace context.

User requests are broken down into spans.

What are spans?

Spans represent a single operation within a trace. Thus, it represents work done by a single service which can be broken down further depending on the use case.

A trace context is passed along when requests travel between services, which tracks a user request across services. Thus, you can see how a user request performs across services and identify what exactly needs your attention without manually shifting through multiple dashboards.

Trace context is passed to track user requests across services
A trace context is passed when user requests pass from one service to another

Key Features of DataDog

DataDog offers an array of services in the monitoring domain. Some of the key areas in monitoring that it covers include:

  • Log Management
  • Application performance monitoring
  • Security monitoring
  • Network monitoring
  • Real user monitoring

Let's focus on the features of application performance monitoring provided by DataDog as it makes more sense when it comes to comparison with Jaeger.

Some of the key features of DataDog APM includes:

  • End-to-end application performance monitoring
    As a full-stack APM tool, using DataDog, you can connect distributed traces to infrastructure metrics, network calls, and live processes.

  • Collection of 100% of traces
    Trace data can be huge. Still, using DataDog, you can collect 100% of your traces generated in the last 15 mins. Then, you can retain the traces showing high latency to investigate further.

  • Code-level visibility for root-cause analysis
    DataDog gives code-level visibility to break down slow requests by time spent on CPU, GC, I/O, etc.

  • Covers wide range of technology stack
    DataDog provides extensive integrations and libraries to monitor Java, .NET, PHP, Node.js, Ruby, Python, Go, or C++ applications.

DataDog APM dashboard
DataDog APM tool showing infrastructure, metrics, logs, errors, processes, network and code hotspots under a single dashboard

DataDog provides code level visibility to identify issues quickly
Find code hotspots using DataDog APM tool

Key features of Jaeger

Jaeger was originally built by teams at Uber and then open-sourced. It is used for end-to-end distributed tracing for microservices. Some of the key features of Jaeger includes:

  • Distributed context propagation
    One of the challenges of distributed systems is to have a standard format for passing context across process boundaries and services. Jaeger provides client libraries that support code instrumentation in multiple languages to propagate context across services

  • Distributed transaction monitoring
    Jaeger comes with a web UI written in Javascript. The dashboard can be used to see traces and spans across services.

  • Root Cause Analysis
    Using traces you can drill down to services causing latency in particular user request.

  • Server dependency analysis
    Using Jaeger's web UI, you can see how requests flow through different services and different servers interact while serving user requests.

  • Performance/latency optimization
    Once you have identified, which service or query is creating latency, you can use the information to optimize it.

Jaeger UI
Jaeger UI showing services and corresponding traces

Comparing DataDog and Jaeger

DataDog is one of the major SaaS vendors in the APM space. On the other hand, Jaeger is a popular open-source distributed tracing tool that graduated from Cloud Native Computing Foundation. The differences between the tools arise from this genesis.

Some of the key differences between DataDog and Jaeger are:

  • Correlation of trace data
    DataDog lets you connect your trace data to a lot of other performance metrics like infrastructure and host metrics, as it is not limited to distributed tracing. Jaeger collects trace data which can give you insights on latencies of requests. You can't use Jaeger for collecting metrics for hosts, networks, etc.

  • Code Instrumentation
    Instrumentation is the process of generating telemetry data from your application. Jaeger uses OpenTracing APIs for code instrumentation. The data format of telemetry data generated is vendor-neutral in the case of Jaeger, and you can also use other back-end analysis tools. DataDog provides DataDog agents which run on your host to collect events and metrics. In the case of proprietary instrumentation agents, your monitoring stack gets locked into a vendor soon. DataDog also supports ingestion from open-source standards like OpenTelemetry, but it's not a first-class citizen.

  • Data Storage
    Jaeger offers two popular open-source databases for storing trace data: Cassandra and Elasticsearch. DataDog is a third-party cloud vendor where your data gets stored in DataDog's servers.

  • Web UI
    DataDog is a SaaS tool that offers a much smoother and more elaborate dashboarding experience, including many customizations. Jaeger's web UI is limited, although it can serve the purpose of distributed tracing.

The decision between DataDog and Jaeger comes down to whether your organization has the budget to go for a paid SaaS tool like DataDog or does your organization has got the engineering bandwidth to run an open-source tool like Jaeger. In addition, as Jaeger is limited to just distributed tracing, your decision also needs to account for whether you need to monitor other components of your application.

The lack of great user experience in open-source tools has always been there. Also, what if there was an open-source tool that could provide the scope of experience of a great SaaS tool like DataDog.

That's where SigNoz comes into the picture.

Alternative to DataDog and Jaeger - SigNoz

SigNoz is a full-stack open-source application performance monitoring and observability tool which can be used in place of DataDog and Jaeger. It provides advanced distributed tracing capabilities along with metrics under a single dashboard.

SigNoz is built to support OpenTelemetry natively. OpenTelemetry is becoming the world standard for generating and managing telemetry data (Logs, metrics, and traces). It also provides users flexibility in terms of storage. You can choose between ClickHouse or Kafka + Druid as your backend storage while installing SigNoz.

Architecture of SigNoz with OpenTelemetry and ClickHouse
Architecture of SigNoz with ClickHouse as storage backend and OpenTelemetry for code instrumentatiion

SigNoz comes with out of box visualization of things like RED metrics.

SigNoz UI showing the popular RED metrics
SigNoz UI showing application overview metrics like RPS, 50th/90th/99th Percentile latencies, and Error Rate

You can also use flamegraphs to visualize spans from your trace data. All of this comes out of the box with SigNoz.

Flamegraphs used to visualize spans of distributed tracing in SigNoz UI
Flamegraphs showing exact duration taken by each spans - a concept of distributed tracing

Some of the things SigNoz can help you track:

  • Application overview metrics like RPS, 50th/90th/99th Percentile latencies, and Error Rate
  • Slowest endpoints in your application
  • See exact request trace to figure out issues in downstream services, slow DB queries, call to 3rd party services like payment gateways, etc
  • Filter traces by service name, operation, latency, error, tags/annotations.
  • Run aggregates on trace data
  • Unified UI for both metrics and traces

You can check out SigNoz's GitHub repo here 👇

SigNoz GitHub repo

Discussion (0)