FlameGraphs on Steroids with profiler.firefox.com

#linux #perf #stat #profiler

Using perf on Linux is a great way to understand your CPU consumption. It is mostly harmless on modern kernels, especially when running for a short time, but try it on a test environment first if you can.

Here is an example where I gather 99 samples per second, during 10 seconds, for PostgreSQL and YugabyteDB processes and threads:


ssh my.server 'sudo bash -c "
perf record -o - --call-graph fp -F99 -e cpu-cycles \
 -p $(pgrep -d, postgres\|yb-tserver\|yb-master) sleep 10 |
 perf script -F +pid
"' > $( date +%Y%m%d%H%M%S | tee /dev/stderr ).perf

This generates a file that can be read by the Firefox Profiler.
Go to https://profiler.firefox.com/ and load the generated file.

You will get something that looks like this:

On the upper right, you can upload it to share it. I did this, which means that you can see mine here: https://share.firefox.dev/4576kKY

I gathered the perf statistics during 10 seconds while running this on YugabyteDB:

drop table if exists demo;
create table demo ( id bigint );
insert into demo select generate_series(1,300000);
\watch

Maybe you already see what's wrong here: I didn't define a primary key. But how does that impact my CPU usage?

I filter on the postgres process:

In YugabyteDB, the postgres process is running the query layer, with additional threads (like pggate to talk to the storage layer):

This shows the call tree I can navigate, similar to perf top or perf report, but with a graphical interface.

One interesting view is the Stack Chart, which follows the timeline:

My query was not active during the whole 10 seconds of sampling. It has been running a few seconds and then \watch was waiting.

A view made popular by Brenda Gregg is the Flame Graph:

Here, for example, I see that I spend some time reading from a virtual filesystem. I focus on this part:

The names of the YugabyteDB functions give a clue. The Call Tree shows the full names (right-click to copy it as text):

The stack gives a lot of information:

_extract_crng
urandom_read_nowarn.isra.34
urandom_read
__vfs_read
vfs_read
ksys_read
__arm64_sys_read
el0_svc_common
el0_svc_handler
el0_svc
__read
boost::uuids::detail::random_provider_base::get_random_bytes
yb::GenerateObjectId
yb::pggate::PgApiImpl::TupleIdBuilder::Build
YBCPgBuildYBTupleId
YBCGetYBTupleIdFromTuple

Of course, I can guess from the function names, but YugabyteDB is open-source, and I can search for them. What happens here is that I didn't declare a Primary Key for my table, and then an internal one (ybctid) is generated because secondary indexes need a key to address the table row. This ID generation calls /dev/urandom. I made this simple example to show that low-level traces can give a clue about high-level data model problems.