trace logs
Trace logs are the detailed internal synchronous & asynchronous operations that take place to fulfill a complete transaction or deliver a result.
CockroachDB trace logs
In CockroachDB these operations may include any combination of network interactions, calls to the storage layer, SQL layer, distribution layer, query batching, reads/updates/writes, and various runtime coordinator services.
Normally this is the level of detail is what you’d find inside a CockroachDB debug ZIP bundle, and specifically the jaeger-file that lets you visualize and interact with the execution flow to identify where processing/waiting time is spent.
improving database performance
Datadog provides tools to assist in the tasks of database tuning, query-tuning, and improvements to schemas.
What’s special is that trace-logs are sent to Datadog in real-time during query execution.
In the Datadog portal you can monitor, filter, and explore the same jaeger-style activity on these dissected operations to help guide you with:
- narrowing down bottle-necks in the cluster network topology
- capturing transient issues tied to network fluctuations
- isolating root-causes of poor-performing queries
- observing application behaviour that send queries
- tracking down the impact of sub-optimal schemas
- providing additional context for Cockroach Labs support teams
to the architecture
The trace logging system is divided into 3 areas of integration.
- The CockroachDB cluster (source; host machine)
- The Datadog agent (OTLP protocol; host machine)
- The Datadog web portal (sink; in cloud)
1) in the cloud: Datadog
Datadog is an integrated cloud-portal offering data-aggregation services, including all the tools needed to specify data-sources, infrastructure observability, build interactive dashboards, explore metrics, define alerts, monitors, and so on.
CockroachDB data source integration (Datadog marketplace)
CockroachDB is one of the many data-source integrations that’s available through their vendor marketplace. This service establishes the core settings for a typical CRDB installation that are required to capture metrics, text-log data, and defines all the back-end monitored properties.
2) on the host machine: Datadog agent
Here we have the Datadog agent. It’s a binary executable that runs as a service in the background, listening to OpenTelemetry data sources. You can find installation instructions on the main portal with download links specific to your target OS.
They provide agents for many environments, including cloud-based and operator-driven deployments for managed Kubernetes and OpenShift.
3) on the host machine: CockroachDB
CockroachDB relies on the OpenTelemetry protocol as part of the observability instrumentation. this can be enabled by applying the cluster setting property trace.opentelemetry.collector to associate with the Datadog agent.
In this post the agent running locally on my laptop along with CockroachDB so I’m sticking to the default Datadog OLTP workload port for the telemetry connection.
to the installation
create an API key
Datadog provides a multi-tenant view to user-content and data-sources. API keys are used to identify and differentiate who’s data we’re working with, and where the data comes from.
install the agent
Browse to the Agents Page and download the binary suitable for your OS or platform.
Once installed, configuration files made available so that you can associate your CockroachDB installation with the agent, and associate the agent with your Datadog account.
associate the Agent with Datadog
Using the API key generated above, you must edit the datadog.yaml file and save the key. The OTLP listening port is 4317 by default and can also be changed if you have multiple listeners or agents present.
edit ~/.datadog-agent/datadog-agent/datadog.yaml
...
...
api_key: **4fxxxxxxxxxxxxxxxxxxxxxxxxxxxxe0**
...
...
otlp_config:
receiver:
protocols:
grpc:
endpoint: localhost:4317
...
...
associate CockroachDB with the agent
Next step (optional): Tag your services and specify CockroachDB endpoints that match your database installation.
Tagging is important where multiple Cockroach DB instances are running on the same server and you need to differentiate between them on your dashboards.
The default prometheus URL setting is localhost:8080/_status/vars but can be changed as follows:
edit ~/.datadog-agent/conf.d/cockroachdb.d/conf.yaml
...
...
## Every instance is scheduled independent of the others.
#
instances:
- openmetrics_endpoint: http://localhost:8091/_status/vars
tags:
- service:mzmz-crdb-8091
- env:mzmz-env-8091
service: mzmz-crdb-8091
...
...
CockroachDB cluster settings
The last setting is to establish the link between CockroachDB and the Datadog agent. This is accomplished by setting the trace-collector to the running Datadog agent as follows:
root@localhost:26257/defaultdb> set cluster setting trace.opentelemetry.collector='localhost:4317';
SET CLUSTER SETTING
Time: 78ms total (execution 78ms / network 0ms)
root@localhost:26257/defaultdb> set cluster setting sql.trace.log_statement_execute=on;
SET CLUSTER SETTING
Time: 41ms total (execution 41ms / network 0ms)
root@localhost:26257/defaultdb> set tracing=on;
SET TRACING
Time: 0ms total (execution 0ms / network 0ms)
Reference
CockroachLabs Docs: Set Tracing
SET TRACING changes the trace recording state of the current session. A trace recording can be inspected with the SHOW TRACE FOR SESSION statement.
SET TRACING=off; -- Trace recording is disabled.
SET TRACING=cluster; -- Trace recording is enabled; distributed traces are collected.
SET TRACING=on; -- Same as cluster.
SET TRACING=kv; -- Same as cluster except that "kv messages" are collected instead of regular trace messages.
SET TRACING=results; -- Result rows and row counts are copied to the session trace. This must be specified in order for the output of a query to be printed in the session trace.
testing the environment
- Launch CockroachDB
- Open the Datadog Agent local admin console (default address is http://127.0.0.1:5002
- Under Checks --> Checks Summary you should see all green checkmarks.
- Sign into your Datadog HQ account
- Left hand navigation bar: APM --> Traces
- Upon running a query, the full trace and spans are sent to this UI and can be dissected.
The UI lets you explore the query history and with capabilities such as sorting by duration, isolating long-running statements is a quick process.
By adjusting the filters, you can switch to other CockroachDB run-time activities such as liveness inspections, tracking jobs, or garbage-collector event logs.
conclusion
Tracing capabilities provided by Datadog offers a non-invasive solution for processing and visualizing logs across many deployment options. It's a fully-managed & integrated cloud solution that offers all the back-end instrumentation allowing you to focus on the health of your applications.
terminology & resources
APM
Application Performance Monitoring
OTLP
The OpenTelemetry Protocol specification describes the encoding/transport/delivery mechanism of telemetry data between sources/collectors/backends. What is OpenTelemetry?
Datadog & CockroachDB integration
The Datadog Agent ingests metrics and traces in the OpenTelemetry format (OTLP), which can be produced by OpenTelemetry-instrumented applications such as CockroachDB.
configure CRDB
configure Datadog
Client-side Datadog configuration (client-side Datadog configuration)
Cloud account Datadog integration (cloud account Datadog integration)
trace
Information about the sub-operations performed as part of a high-level operation (a query or a transaction). This information is internally represented as a tree of "spans", with a special "root span" representing a whole SQL transaction
CockroachLabs Docs: Set Tracing
Top comments (1)
Great post on monitoring @world2mark. I just want to suggest another option to Datadog. Datadog and its agent can be replaced by Quickwit. Quickwit is opensource and provides scalability, cost-effectiveness, and more control over very large amounts of logs & tracing data. You also retain the ability to use your existing toolbelt: #Jeager #restapi and the soon-to-come #grafana integration.
checkout the docs