After attending the Global AWS Heroes Summit in Seattle, I'm traveling though US and going to present and demonstrate YugabyteDB to the following locations:
- Palo Alto, July 22, 2024 - Location: AWS Office, 2100 University Ave C
- Austin, July 24, 2024 Location: AWS Office, 11815 Alterra Pkwy Suite 900
- Boston, July 25, 2024- Location: SPIN Boston, 30 Melcher St
I would be glad to meet you in those locations, and I can share the slides with you afterward. I will also do a live demonstration of YugabyteDB's elasticity and resilience using a great platform for a cloud-native database: Amazon Elastic Kubernetes Service (Amazon EKS). Here is what I will most likely present during the demonstration.
I use an AWS user with the minimum IAM policies for EKSCTL
I'll create an Amazon EKS cluster using the eksctl
command-line tool. I'll configure the cluster with specific node types, sizes, and regions, and then update my .kube/config
to manage the cluster using kubectl.
eksctl create cluster --name yb-demo1 --tags eks=yb-demo1 \
--version 1.27 \
--node-type c5.2xlarge --nodes 4 --nodes-min 0 --nodes-max 4 \
--region eu-west-1 --zones eu-west-1a,eu-west-1b,eu-west-1c \
--nodegroup-name standard-workers --managed --dumpLogs
aws eks update-kubeconfig --region eu-west-1 --name yb-demo1
I'll set up a single-region highly available YugabyteDB cluster on Amazon EKS. I'll automate the creation of storage classes and StatefulSets for each availability zone within a specified region, eu-west-1
, ensuring the proper configuration of multi-AZ deployments. This process involves generating necessary Kubernetes manifests, applying them, and using Helm to deploy the YugabyteDB database.
cd
region=eu-west-1
masters=$(for az in {a..c} ; do echo "yb-master-0.yb-masters.yb-demo-${region}${az}.svc.cluster.local:7100" ; done | paste -sd,)
placement=$(for az in {a..c} ; do echo "aws.${region}.${region}${az}" ; done | paste -sd,)
echo
echo "$masters"
echo "$region"
echo "$placement"
# create per-AZ StorageClass and StatefulSet
{
for az in {a..c} ; do
{ cat <<CAT
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: standard-${region}${az}
provisioner: kubernetes.io/aws-ebs
parameters:
type: gp2
zone: ${region}${az}
CAT
} > ${region}${az}.storageclass.yaml
echo "kubectl apply -f ${region}${az}.storageclass.yaml"
{ cat <<CAT
isMultiAz: True
AZ: ${region}${az}
masterAddresses: "$masters"
storage:
ephemeral: true
master:
storageClass: "standard-${region}${az}"
tserver:
storageClass: "standard-${region}${az}"
replicas:
master: 1
tserver: 2
totalMasters: 3
resource:
master:
requests:
cpu: 0.5
memory: 1Gi
tserver:
requests:
cpu: 1
memory: 2Gi
gflags:
master:
placement_cloud: "aws"
placement_region: "${region}"
placement_zone: "${region}${az}"
ysql_num_shards_per_tserver: 2
tserver_unresponsive_timeout_ms: 10000
hide_dead_node_threshold_mins: 15
tserver:
placement_cloud: "aws"
placement_region: "${region}"
placement_zone: "${region}${az}"
ysql_num_shards_per_tserver: 2
ysql_enable_auth: true
yb_enable_read_committed_isolation: true
ysql_suppress_unsafe_alter_notice: true
ysql_beta_features: true
allowed_preview_flags_csv: ysql_yb_ash_enable_infra,ysql_yb_enable_ash
ysql_yb_ash_enable_infra: true
ysql_yb_enable_ash: true
transaction_rpc_timeout_ms : 15000
transaction_max_missed_heartbeat_periods : 40
CAT
} > ${region}${az}.overrides.yaml
echo "kubectl create namespace yb-demo-${region}${az}"
echo "helm install yb-demo yugabytedb/yugabyte --namespace yb-demo-${region}${az} -f ${region}${az}.overrides.yaml --wait \$* ; sleep 5"
done
echo "kubectl exec -it -n yb-demo-${region}a yb-master-0 -- yb-admin --master_addresses $masters modify_placement_info $placement 3"
} | tee yb-demo-install.sh
This has generated the install script. I'll start by adding the YugabyteDB Helm repository and updating it, then proceed to deploy YugabyteDB using the script.
# install yugabyte Helm chart
helm repo add yugabytedb https://charts.yugabyte.com
helm repo update
helm search repo yugabytedb/yugabyte | cut -c-75
export PATH="$HOME/cloud/aws-cli:$PATH"
. yb-demo-install.sh
I'll share the internal web console, which is open to the public web, with the attendees. This will allow them to view the deployment, tables, and perform read/write operations on their own. Here's how I generate a QR code for the URL of this UI service:
for i in 0 ; do echo $(kubectl get svc -n yb-demo-eu-west-1a --field-selector "metadata.name=yb-master-ui" -o jsonpath='http://{.items['$i'].status.loadBalancer.ingress[0].hostname}:{.items[0].spec.ports[?(@.name=="http-ui")].port}') ; done | while read url ; do echo "yb-master-ui $i" ; qrencode -t ANSIUTF8 -s 1 -m 0 "$url" ; echo "$url" ; done
PgBench
I'll use kubectl to start a PostgreSQL pod in eu-west-1a
, set the necessary environment variables to connect to the load balancer in this availability zone, and execute the pgbench tool to create the tables with primary and foreign keys. Then I fill the table with scale 150 and YugabyteDB non-transactional writes for faster bulk load.
kubectl -n yb-demo-eu-west-1a run -it eu-west-1a --image=postgres:17beta2 --rm=true bash
export PGPASSWORD=yugabyte PGUSER=yugabyte PGDATABASE=yugabyte PGPORT=5433 PGHOST=yb-tservers
pgbench -iIdtpf
PGOPTIONS="-c yb_disable_transactional_writes=on -c yb_default_copy_from_rows_per_transaction=0" pgbench -iIg --scale 150
psql -c "select tablename, pg_size_pretty(pg_table_size(tablename::regclass)) from pg_tables where schemaname='public'"
I'll run the Simple Update benchmark from 20 clients.
pgbench --client 20 -nN --progress 5 --max-tries 10 -T 3600
Elasticity
To demonstrate how we can scale without downtime, I'll double the number of tablet servers in each availability zone by simply scaling the StatefulSets:
kubectl scale -n yb-demo-eu-west-1a statefulset yb-tserver --replicas=4
kubectl scale -n yb-demo-eu-west-1b statefulset yb-tserver --replicas=4
kubectl scale -n yb-demo-eu-west-1c statefulset yb-tserver --replicas=4
The PgBench application will continue to run, and the throughput will increase. However, because PgBench runs with a static set of connections initiated when starting it, it will not fully benefit from all nodes and only the reads and writes will be rebalanced. Restarting PgBench will also balance the connections to the new nodes and the throughput will double. Of course, with a connection pool, you wouldn't need to restart it.
Resilience
To demonstrate resilience to failure, one node can be removed by scaling down one StatefulSet. I will even scale down an AZ to zero to simulate a zone failure:
kubectl scale -n yb-demo-eu-west-1c statefulset yb-tserver --replicas=3
kubectl scale -n yb-demo-eu-west-1c statefulset yb-tserver --replicas=0
kubectl scale -n yb-demo-eu-west-1c statefulset yb-tserver --replicas=4
The PgBench application should continue without any errors. The only consequence of the failure is a few seconds of decreased throughput due to various TCP/IP timeouts for connections to the failed nodes.
Cleanup
You can play as you want with such lab. Don't forget to terminate the EKS cluster:
aws cloudformation delete-stack --stack-name eksctl-yb-demo1 aws cloudformation delete-stack --stack-name eksctl-yb-demo1-cluster
aws cloudformation delete-stack --stack-name eksctl-yb-demo1-nodegroup-standard-workers
eksctl delete cluster --region=eu-west-1 --name=yb-demo1
At those locations, I won't be presenting alone. Colleagues from YugabyteDB, AWS partners, and Snowflake and Striim speakers will also be presenting. It will be an excellent opportunity to network, learn, and share.
Some parameters explanations
I default to two tablets per nodes because I have the intention to double the number of nodes. This is set by ysql_num_shards_per_tserver=2
.
PgBench creates the table (t
) and adds the primary key (p
) with an ALTER TABLE. This raises a notice in YugabyteDB (I'm running version 2024.1) because it is not an online operation. I suppress the notice for clarity. The PgBench initialization still shows the following when creating the table:
WARNING: storage parameter fillfactor is unsupported, ignoring
This message is misleading because it doesn't mean it is not supported; it just means it doesn't apply to LSM Trees used by YugabyteDB instead of PostgreSQL heap tables.
In the version I'm running, the ANALYZE is still in beta, so I set ysql_beta_features=true
to avoid being notified.
I enabled Active Session History, which is still in preview with the following:
--allowed_preview_flags_csv=ysql_yb_ash_enable_infra,ysql_yb_enable_ash
--ysql_yb_ash_enable_infra=true
--ysql_yb_enable_ash=true
I enable the read committed mode, which is not by default in this version and is the isolation level expected by PgBench: yb_enable_read_committed_isolation=true
. This mode also reduces the probability of errors (node failure or clock skew) as it allows transparent retries of statements within a transaction.
By default, the transaction timeout is 5 seconds, defined as transaction_rpc_timeout_ms=5000
for the remote call and transaction_max_missed_heartbeat_periods=10
(heartbeats are every transaction_heartbeat_usec=500000
) for the transaction coordinator.
I bumped this to 15 seconds to avoid the following error for long transactions when waiting for the 15 seconds TCP timeout:
pgbench: error: client 6 script 0 aborted in command 8 query 0:
ERROR: UpdateTransaction RPC (request call id 2745679) to 192.168.87.205:9100 timed out after 4.972s
(If you were at the demo in Palo Alto or Austin, I didn't set those parameters and got this error - I've added them for Boston's demo)
It can still happen that during a crash or unstable network, a write operation is completed on the table server's DocDB layer, but the response to the YSQL layer is lost. Then, the YSQL layer transparently retries it but with a safeguard to avoid duplicate operations. In this case, the following error can be raised:
pgbench: error: client 10 script 0 aborted in command 7 query 0:
ERROR: Duplicate request 126674 from client 08b09407-d78e-4c10-982f-be446d9a4e87 (min running 126674)
A non-accessible tablet server on YB-Master is marked as "DEAD" after one minute. To quickly visualize this during the demo, I reduce the time to 15 seconds using the tserver_unresponsive_timeout_ms=15000
flag. Additionally, I set hide_dead_node_threshold_mins=15
to ensure that the tablet server disappears from the list after 15 minutes instead of the default 24 hours. This helps to illustrate that if the node comes back after 15 minutes, it is considered a new node because it has passed the 15-minute threshold defined by follower_unavailable_considered_failed_sec
.
Top comments (0)