DEV Community

Christopher Kujawa (Zell)
Christopher Kujawa (Zell)

Posted on • Originally published at Medium on

Zeebe Debug and Inspection tool

Have you ever had the case of an incident and didn’t know what this thing you’re running in production was actually doing or how it ended up in that state?

Crashed airplane

Photo from Jonathan Gallegos on Unsplash

With Zeebe (the process automation engine powering Camunda Platform 8) we let our customer's business fly. But what if the thing which brings the business to fly, breaks? Similarly, if an airplane crashes you need something to read the flight recorder on board.

Today I want to introduce you to a tool we created for Zeebe in order to read this “flight recorder” (state) and support us in our incidents. Because in the past, if Zeebe ran into some processing problems there was no possibility to find out the last processing state. If there was no exporter configured or they haven’t exported for a while it was even worse, since it was not clear what the last internal engine state was.

In order to shed some more light in the dark we build a tool called zdb — Zeebe Debugger. Zdb is a Java (17) CLI tool to inspect the internal state and log of a Zeebe partition. It was kicked off during the Camunda Summer Hackdays in 2020 (by Nico Korthout, Deepthi Akkoorath, and Christopher Kujawa) and has been maintained and developed by me since then. Now reaching version 1.8.0, with new features (printing and filtering the log in a nicer way).

zdb allows us to find the root cause, create fixes, and be prepared for the next upcoming (since failures always happen eventually). We use it on many of our incidents if we need to take a look at the current state of Zeebe. But also when investing in certain bugs. With zdb, we finally know what Zeebe was doing and how it came into that state.

In the end, the goal is always to bring our customers back to fly and keep them there.


In the following blog post, I want to show you some examples of how we used zdb in the past to give you some inspiration on how it might help you.

Note: The output of zdb will always be JSON, which allows us to pipe it into jq, such that we can have nicer and filterable output. This is also used in our examples below.

General statistics

Often when you start working on an incident you need to get a first overview or understanding of what the state generally contains (depending on the problems of course). Here zdb can show you statistics of how many key-value pairs are stored in the internal state (in different column families).

$ zdb state --path <path-to-runtime-or-snapshot> | jq
{
  "DEFAULT": 1,
  "KEY": 1,
  "PROCESS_VERSION": 3,
  "PROCESS_CACHE": 3,
  "PROCESS_CACHE_BY_ID_AND_VERSION": 3,
  "PROCESS_CACHE_DIGEST_BY_ID": 3,
  "ELEMENT_INSTANCE_PARENT_CHILD": 6,
  "ELEMENT_INSTANCE_KEY": 6,
  "ELEMENT_INSTANCE_CHILD_PARENT": 6,
  "VARIABLES": 12,
  "TIMERS": 2,
  "TIMER_DUE_DATES": 2,
  "JOBS": 1,
  "JOB_STATES": 1,
  "JOB_DEADLINES": 1,
  "MESSAGE_START_EVENT_SUBSCRIPTION_BY_NAME_AND_KEY": 1,
  "MESSAGE_START_EVENT_SUBSCRIPTION_BY_KEY_AND_NAME": 1,
  "EVENT_SCOPE": 3,
  "EXPORTER": 2
}
Enter fullscreen mode Exit fullscreen mode

An experienced Zeebe engineer or power user can see here already how many processes have been deployed, how many instances, jobs, timers, messages, etc. have been created and are currently in the state. This often helps to determine where to look next.

For example, if we see there are incidents in process instances in the state and the reported failure (ongoing incident) is about not progressing process instances we would check next the open incidents in the state.

Restoring BPMN models

There are cases where you might lose your models, or you just want to find out which model has been currently deployed or is actually executed. Here zdb can help.

First, you can print all deployed process model metadata (it will show information like process definition key, version, and name).

zdb process list --path <path-to-runtime-or-snapshot> | jq
[
  {
    "bpmnProcessId": "benchmark",
    "resourceName": "bpmn/one_task.bpmn",
    "processDefinitionKey": 2251799813685363,
    "version": 1
  },
  {
    "bpmnProcessId": "timerProcess",
    "resourceName": "bpmn/timerProcess.bpmn",
    "processDefinitionKey": 2251799813685249,
    "version": 1
  },
  {
    "bpmnProcessId": "msg_one_task",
    "resourceName": "bpmn/msg_one_task.bpmn",
    "processDefinitionKey": 2251799813685581,
    "version": 1
  }
]
Enter fullscreen mode Exit fullscreen mode

With a specific process definition key, we can print the complete process entity. Piping it here to jq allows us to filter for the resource, and the --raw-output option returns us the resource string without quotes. We can then direct the output to a file and have the model restored (you can open it with for example the Camunda Modeler).

zdb process entity 2251799813686656 --path <path-to-runtime-or-snapshot> \
| jq --raw-output '.resource' > model.bpmn
Enter fullscreen mode Exit fullscreen mode

BPMN model

Restored Model

Instances for a specific model

Sometimes you’re interested in process instances of a specific process model.

You might have deployed a broken model and want to cancel all of the existing instances (that happened to us), but first, you need to find out all the keys of such instances.

You can use the following to print all instances for a certain process definition.

zdb process instances 2251799813685363 --path <path-to-runtime-or-snapshot> | jq
Enter fullscreen mode Exit fullscreen mode

Printing the log

One of our most used zdb features is printing the entire log (default: as JSON).

zdb log print --path <path-to-log>
Enter fullscreen mode Exit fullscreen mode

With the newest version (v1.8.0), zdb supports some built-in filters, like filtering for the process instance key. This means only records that correspond to a certain process instance are printed. Furthermore, we can limit the output now, with --fromPosition and --toPosition. You can read more about it here.

Not only JSON is the supported output format. zdb can print the log in dot formatas well, which allows tracing commands.

zdb log print --format dot --path <path-to-log> > output.dot
Enter fullscreen mode Exit fullscreen mode

With Graphviz you can visualize such dot files easily

dot -Tsvg -o output.svg output.dot
Enter fullscreen mode Exit fullscreen mode

Trace

Trace of log

Printing and investigating the log is interesting since not all commands can be applied and are then not reflected in the state. The reasons can be many. Some might be rejected due to a wrong user input or wrong process instance state, etc. These commands and their rejections are still part of the log (if compaction hasn’t happened yet) and can give you some interesting insights.

I hope this small introduction and examples gave you some inspiration on how you can use zdb on your next potential incident or investigation related to Zeebe. If you want to know more, check out the GitHub repository.

Top comments (0)