DEV Community

Cover image for CockroachDB: trace logging with Datadog
Mark Zlamal
Mark Zlamal

Posted on

CockroachDB: trace logging with Datadog

trace logs

Trace logs are the detailed internal synchronous & asynchronous operations that take place to fulfill a complete transaction or deliver a result.

CockroachDB trace logs

In CockroachDB these operations may include any combination of network interactions, calls to the storage layer, SQL layer, distribution layer, query batching, reads/updates/writes, and various runtime coordinator services.

Normally this is the level of detail is what you’d find inside a CockroachDB debug ZIP bundle, and specifically the jaeger-file that lets you visualize and interact with the execution flow to identify where processing/waiting time is spent.

improving database performance

Datadog provides tools to assist in the tasks of database tuning, query-tuning, and improvements to schemas.

What’s special is that trace-logs are sent to Datadog in real-time during query execution.

In the Datadog portal you can monitor, filter, and explore the same jaeger-style activity on these dissected operations to help guide you with:

  • narrowing down bottle-necks in the cluster network topology
  • capturing transient issues tied to network fluctuations
  • isolating root-causes of poor-performing queries
  • observing application behaviour that send queries
  • tracking down the impact of sub-optimal schemas
  • providing additional context for Cockroach Labs support teams

to the architecture

The trace logging system is divided into 3 areas of integration.

  1. The CockroachDB cluster (source; host machine)
  2. The Datadog agent (OTLP protocol; host machine)
  3. The Datadog web portal (sink; in cloud)

Core Architecture

architecture of CockroachDB and Datadog integration

1) in the cloud: Datadog

Datadog is an integrated cloud-portal offering data-aggregation services, including all the tools needed to specify data-sources, infrastructure observability, build interactive dashboards, explore metrics, define alerts, monitors, and so on.

CockroachDB data source integration (Datadog marketplace)

CockroachDB is one of the many data-source integrations that’s available through their vendor marketplace. This service establishes the core settings for a typical CRDB installation that are required to capture metrics, text-log data, and defines all the back-end monitored properties.

2) on the host machine: Datadog agent

Here we have the Datadog agent. It’s a binary executable that runs as a service in the background, listening to OpenTelemetry data sources. You can find installation instructions on the main portal with download links specific to your target OS.

Datadog agents

They provide agents for many environments, including cloud-based and operator-driven deployments for managed Kubernetes and OpenShift.

3) on the host machine: CockroachDB

CockroachDB relies on the OpenTelemetry protocol as part of the observability instrumentation. this can be enabled by applying the cluster setting property trace.opentelemetry.collector to associate with the Datadog agent.

In this post the agent running locally on my laptop along with CockroachDB so I’m sticking to the default Datadog OLTP workload port for the telemetry connection.


to the installation

create an API key

Datadog provides a multi-tenant view to user-content and data-sources. API keys are used to identify and differentiate who’s data we’re working with, and where the data comes from.

Core Architecture

API keys in Datadog

install the agent

Browse to the Agents Page and download the binary suitable for your OS or platform.

Once installed, configuration files made available so that you can associate your CockroachDB installation with the agent, and associate the agent with your Datadog account.

associate the Agent with Datadog

Using the API key generated above, you must edit the datadog.yaml file and save the key. The OTLP listening port is 4317 by default and can also be changed if you have multiple listeners or agents present.

edit ~/.datadog-agent/datadog-agent/datadog.yaml

...
...
api_key: **4fxxxxxxxxxxxxxxxxxxxxxxxxxxxxe0**
...
...
otlp_config:
  receiver:
    protocols:
      grpc:
        endpoint: localhost:4317
...
...
Enter fullscreen mode Exit fullscreen mode

associate CockroachDB with the agent

Next step (optional): Tag your services and specify CockroachDB endpoints that match your database installation.

Tagging is important where multiple Cockroach DB instances are running on the same server and you need to differentiate between them on your dashboards.

The default prometheus URL setting is localhost:8080/_status/vars but can be changed as follows:

edit ~/.datadog-agent/conf.d/cockroachdb.d/conf.yaml

...
...
## Every instance is scheduled independent of the others.
#
instances:
  - openmetrics_endpoint: http://localhost:8091/_status/vars
    tags:
      - service:mzmz-crdb-8091
      - env:mzmz-env-8091
    service: mzmz-crdb-8091
    ...
    ...
Enter fullscreen mode Exit fullscreen mode

CockroachDB cluster settings

The last setting is to establish the link between CockroachDB and the Datadog agent. This is accomplished by setting the trace-collector to the running Datadog agent as follows:

root@localhost:26257/defaultdb> set cluster setting trace.opentelemetry.collector='localhost:4317';
SET CLUSTER SETTING
Time: 78ms total (execution 78ms / network 0ms)

root@localhost:26257/defaultdb> set cluster setting sql.trace.log_statement_execute=on;
SET CLUSTER SETTING
Time: 41ms total (execution 41ms / network 0ms)

root@localhost:26257/defaultdb> set tracing=on;
SET TRACING
Time: 0ms total (execution 0ms / network 0ms)
Enter fullscreen mode Exit fullscreen mode

Reference

CockroachLabs Docs: Set Tracing

SET TRACING changes the trace recording state of the current session. A trace recording can be inspected with the SHOW TRACE FOR SESSION statement.
SET TRACING=off;  -- Trace recording is disabled.
SET TRACING=cluster;  -- Trace recording is enabled; distributed traces are collected.
SET TRACING=on;  -- Same as cluster.
SET TRACING=kv;  -- Same as cluster except that "kv messages" are collected instead of regular trace messages.
SET TRACING=results; -- Result rows and row counts are copied to the session trace. This must be specified in order for the output of a query to be printed in the session trace.
Enter fullscreen mode Exit fullscreen mode

testing the environment

  1. Launch CockroachDB
  2. Open the Datadog Agent local admin console (default address is http://127.0.0.1:5002
  3. Under Checks --> Checks Summary you should see all green checkmarks.
    Checks Summary for Datadog Agent
    In my installation, I have 2 CockroachDB instances configured, but only one is running.
  4. Sign into your Datadog HQ account
  5. Left hand navigation bar: APM --> Traces
    APM Traces
    Tracing from CockroachDB emits large volumes of data. Searching for sql txn will remove the noise and focus the trace-spans onto your queries for inspection.
  6. Upon running a query, the full trace and spans are sent to this UI and can be dissected.
    APM Traces
    Visualize where (and how) time is spent processing a query, and can be used as a baseline for comparison against related queries. A typical example is to confirm range hot-spots due to long wait-times acquiring a write-latch.

The UI lets you explore the query history and with capabilities such as sorting by duration, isolating long-running statements is a quick process.

By adjusting the filters, you can switch to other CockroachDB run-time activities such as liveness inspections, tracking jobs, or garbage-collector event logs.


conclusion

Tracing capabilities provided by Datadog offers a non-invasive solution for processing and visualizing logs across many deployment options. It's a fully-managed & integrated cloud solution that offers all the back-end instrumentation allowing you to focus on the health of your applications.


terminology & resources

APM

Application Performance Monitoring

OTLP

The OpenTelemetry Protocol specification describes the encoding/transport/delivery mechanism of telemetry data between sources/collectors/backends. What is OpenTelemetry?

Datadog & CockroachDB integration

The Datadog Agent ingests metrics and traces in the OpenTelemetry format (OTLP), which can be produced by OpenTelemetry-instrumented applications such as CockroachDB.

configure CRDB

CockroachDB cluster configuration

configure Datadog

Client-side Datadog configuration (client-side Datadog configuration)
Cloud account Datadog integration (cloud account Datadog integration)

trace

Information about the sub-operations performed as part of a high-level operation (a query or a transaction). This information is internally represented as a tree of "spans", with a special "root span" representing a whole SQL transaction
CockroachLabs Docs: Set Tracing

Top comments (1)

Collapse
 
evanxg852000 profile image
Evance Soumaoro • Edited

Great post on monitoring @world2mark. I just want to suggest another option to Datadog. Datadog and its agent can be replaced by Quickwit. Quickwit is opensource and provides scalability, cost-effectiveness, and more control over very large amounts of logs & tracing data. You also retain the ability to use your existing toolbelt: #Jeager #restapi and the soon-to-come #grafana integration.
checkout the docs