With YugabyteDB you can scale out your PostgreSQL application, which can connect to any node for consistent reads and writes. Here is an example using the basic load balancing provided by the PostgreSQL client library. I'll run PgBench with connections to multiple nodes and without an additional proxy or driver.
For this example, I've installed the latest client, in beta, as in the previous post, and I will connect to a 3 nodes YugabyteDB cluster that I've started, for this lab, in a single VM:
psql (16beta1, server 11.2-YB-2.17.3.0-b0)
Type "help" for help.
yugabyte=# select public_ip, port, region, node_type from yb_servers();
public_ip | port | region | node_type
------------+------+--------+-----------
10.0.0.142 | 5433 | north | primary
10.0.0.143 | 5433 | south | primary
10.0.0.141 | 5433 | west | primary
(3 rows)
yugabyte=#
I set the PostgreSQL environment variables to connect to my YugabyteDB cluster:
export PGHOST=10.0.0.141,10.0.0.142,10.0.0.143
export PGPORT=5433
export PGUSER=yugabyte
export PGDATABASE=yugabyte
The list in PGHOST is used for High Availability. The hosts are take in order so that if the first one is down, it can connect to the next one. In PostgreSQL, this is used to connect to read replicas. In YugabyteDB, as you can connect to any node for consistent reads and writes. Note that I could have used a single hostname with multiple IP addresses in the DNS.
PgBench
I create the pgbench
tables:
$ pgbench -iIdtp
dropping old tables...
creating tables...
WARNING: storage parameter fillfactor is unsupported, ignoring
WARNING: storage parameter fillfactor is unsupported, ignoring
WARNING: storage parameter fillfactor is unsupported, ignoring
creating primary keys...
done in 3.01 s (drop tables 0.77 s, create tables 0.70 s, primary keys 1.54 s).
$ pgbench -iIGf
generating data (server-side)...
creating foreign keys...
done in 4.17 s (server-side generate 2.24 s, foreign keys 1.94 s).
The warning can be ignored (YugabyteDB has no need for FILLFACTOR as it is not using Heap Tables). For the same reason (tables being stored in their primary key) I create the empty tables and the primary key immediately (tp
initialization steps) and then ingest data and create the foreign keys (Gf
steps).
host load balancing disabled
I run pgbench
workload from 12 clients (-c 12
) with the settings above (a list in PGHOST
but PGLOADBALANCEHOST
is not set and defaults to disable
):
$ pgbench -T60 -c 12 -nN
transaction type: <builtin: simple update>
scaling factor: 1
query mode: simple
number of clients: 12
number of threads: 1
maximum number of tries: 1
duration: 60 s
number of transactions actually processed: 19596
number of failed transactions: 0 (0.000%)
latency average = 36.682 ms
initial connection time = 134.319 ms
tps = 327.133784 (without initial connection time)
I can see the all 12 connections connected to 10.0.0.141, the first one in PGHOST:
host load balancing random
I run the same but setting PGLOADBALANCEHOSTS
to random
:
$ export PGLOADBALANCEHOSTS=random
$ pgbench -T30 -c 12 -nN
pgbench (16beta1, server 11.2-YB-2.17.3.0-b0)
transaction type: <builtin: simple update>
scaling factor: 1
query mode: simple
number of clients: 12
number of threads: 1
maximum number of tries: 1
duration: 30 s
number of transactions actually processed: 9619
number of failed transactions: 0 (0.000%)
latency average = 37.310 ms
initial connection time = 130.484 ms
tps = 321.626355 (without initial connection time)
Now, the connections are randomly distributed:
This is a very simple way to balance the connections to a YugabyteDB cluster. Note that, even if all were connected to one node, the data is sharded and balanced with remote read and write operations to the DocDB layer of the tablet servers. However, the SQL processing itself happens in YSQL, the PostgreSQL backend on the node you are connected to. Fur the best scalability, connections should also be balanced.
Host discovery and Smart Driver
Because yb_servers()
is updated when nodes are added or removed in the YugabyteDB cluster, you can set PHGOST from it. You can even restrict to a specific cloud, region or zone:
export PGHOST=$(psql -tAc "
select public_ip from yb_servers()
where region in ('north','west')
" | paste -sd,)
The same can be done in the application where the host information of the connection pool could be refreshed regularly. This is actually what is done automatically by the YugabyteDB Smart Drivers and is the preferred method for client-side load-balancing.
The PGLOADBALANCEHOSTS=random
solution presented here is convenient when there is no smart driver, as it works with any LibPQ client, like PgBench here.
Top comments (0)