An introduction to connection-profiler

#postgres #performance #yugabyte

Understanding postgres performance fundamentally relies on the communication between the client and the database side process called 'backend' in postgresql. I created a small utility to profile the postgres database connection network communication: connection-profiler

The first question to answer is why create a connection-profiler tool while there is pgbench. The answer to that is that pgbench is for running benchmarks, which means run SQL against postgres, and not so much for examining individual requests. So it serves a different function than the connection-profiler. It is important to use the right tool for the job!

The connection-profiler takes a connection URL, such as host=192.168.66.80 port=5433 sslmode=disable user=yugabyte password=yugabyte, and connects to the database using it, and optionally can execute a SQL statement.

It allows to specify the connection protocol (simple, extended or prepared). The simple setting uses the postgres simple query protocol, the extended setting uses the postgres extended protocol, and the prepared setting uses the postgres extended protocol and the usage of a prepared statement.

For a single execution of SQL, which connection-profiler by default does, using a prepared statement does not make sense: the purpose of a prepared statement is to have a statement in parsed state so that the parse step (which includes the syntactic and semantic parsing and rewriting) can be skipped after the prepared statement is created (and thus parsed). It additionally enables the postgres planner to decide to use a generic plan (which requires at least 5 executions of the default "custom plan").

So why provide this option? This is because the purpose of the tool is to measure what is happening inside the connection, and using a prepared statement is a distinct pattern.

Optionally, connection-profiler can be configured to execute the specified SQL multiple times inside the same connection, or terminate the connection and connect again to execute the specified SQL. This is done with the -r/--repeat-sql switch, which takes a number, and the -n/--repeat-connect switch, which also takes a number.

One of the things for which this can reveal the impact, is the impact of the recreation of the pg_internal.init file for the first backend to logon to a database after startup, and also the latency of parsing, which in YugabyteDB could require performing RPC calls, for which both the time is not visible in the pg_stat_statements.

If you are intested in the Rust executable itself, and the way it is using the rust-postgres crate (for people not familiar with Rust crates: a crate is essentially a library): Cargo allows you to use a crate in 'patch mode', which means you download the source of the crate, and point Cargo.toml to it, saying you want to use it in patch mode:

[patch.crates-io]
postgres-protocol = { path = "rust-postgres/postgres-protocol"}
tokio-postgres = { path = "rust-postgres/tokio-postgres"}

Now you can make changes to the files in the crate, and Cargo will compile the executable with the changes made to it.

If anyone finds any other parts in the crates which provide more insight into what is happening, please reach out to me, and I'll add it to the repo.

DEV Community

An introduction to connection-profiler

Top comments (0)

Read next

Setup PostgreSQL w/ pgvector in a docker container

Adding new columns - lowCalAlt_update5

Aurora Limitless - Collocation (and porting PgBench)

How to Write Effective Test Scripts for Shopify Apps: A Simple Guide