Say this five times fast: strace, ptrace, dtrace, dtruss

#linux #strace #dtrace #macos

This blog post originally started off as another one of my Unix command deep dives (remember those?), where I dive into the internals of a common Linux command. I was trying to run strace to determine the system calls that were invoked by the command that I was exploring when I recalled that I was on macOS and that macOS did not have strace but instead used a tool called dtruss to track syscalls invoked during the execution of a program.

Now, I've been relatively ignorant about the distinction between strace and dtruss before. All I knew was that dtruss did what I needed it to do and I didn't much bother looking into the details of how it worked or what it was.

But today is the day to shed the cloak of ignorance, friends!

What is strace?

strace is a system call tracer and also one of the few things in tech that has a name that reasonably matches what it does. You might be familiar with strace from Julia Evans' strace zine. I think Julia's zine is a great way to learn about strace but here's my two point summary on what strace is.

System calls are an interface that allows a program to request some functionality from the operating system. These system calls do things like changing the current working directory, changing the permissions on files, and so on. You can view a full list of system calls here.
strace lists out the system calls that a program invokes as it executes.

One thing that the zine doesn't going into is how strace works under the hood. I'll dive into that here. Under the hood, strace leverages ptrace, which stands for process trace, a system call that allows a parent process to watch and control the execution of a child process. It's used in strace, but it also enables things like the gdb debugger. The ptrace system call uses some internal Linux data structures to establish a relationship between the tracer (the parent process) and the traced (the child process). Whenever a system call is invoked in the traced process, the tracer will be notified of the system call and the traced process will be temporarily stopped. At this point in time, whatever program is invoking ptrace, whether it is strace or gdb, will process the information about the system call it was notified of and then return control back to the child process. This jumping back and forth between a child process, ptrace, and a higher level program highlights one of the downfalls of strace. Because the operating system has to switch contexts between several processes repeatedly, strace is not that fast.

In summary, ptrace acts as a mediator between the running process and a higher level tool such as gdb or strace.

What is dtrace?

Now, this is where I had to do a little bit of research. The first definition I found of dtrace was on Brendan Gregg's website which defined dtrace, or I guess I can call it DTrace, as "an implementation of dynamic tracing." What is dynamic tracing? I had to do quite a bit of digging to find a resource that explained this well. In the end, I came across this article, which helped me grok what was going on.

Whereas strace relies on ptrace to introspect processes, dtrace goes about things a little bit differently. With dtrace, the programmer writes probes in a language with a C-like syntax called D. These probes define what dtrace should do when it invokes a system call, exits a function, or whatever else you'd like. These probes are stored in a script file that looks something like this.

syscall::read:entry {
    printf("read has been called.");
}

This script states that whenever the read system call is invoked, the tracer should print out the string "read has been called." The script file is then invoked with dtrace like so.

$ dtrace -s my_probe.d

dtrace then invokes the logic within the probe whenever it runs to the event outlined in that probe (entering a certain system call or exiting a function and so on). This flexibility lends DTrace its title as a dynamic tracer.

What is dtruss?

The next thing I set out to uncover was what dtruss was. The first definition I ran into was from the dtruss manpage which defined dtruss as a "a DTrace version of truss." Well, I guess I better figure out what truss is first then. As it turns out, truss is a Unix-specific command that allows the user to print out the system calls made by a program. It's essentally a varient of the strace tool that exists on Linux. Knowing this, I think the best way to describe it would be to use an analogy: strace is to dtrace as truss is to dtruss.

What other tracing tools exist?

Now, as it turns out, strace and dtrace aren't the only tools in our toolkit of tracers. My investigation eventaully led me to explore the wider world of tracers. As it turns out, Julia comes to the rescue once again. Brendan Gregg has another blog post with a list of different Linux tracers, how they work, and when you can use them. Brendan seems like quite the authority figure in this space, having published several books on tracing and written many nice blog posts. If you're interested in diving more into this, I would recommend checking out some of his blog posts.

Conclusion

Well, wasn't that a fun slide down the iceberg. It's always pretty fun when you start by posing a simple question (what is the difference between strace and DTrace) and end up discovering something much bigger (a whole new world of tracers).

What tracer do you use on a regular basis? Is there a particular tracing tool that you prefer over others?