DEV Community

Cover image for Running QuestDB on GKE Autopilot
Yitaek Hwang
Yitaek Hwang

Posted on • Originally published at questdb.io

Running QuestDB on GKE Autopilot

Extending the QuestDB Helm chart with monitoring and automated backups for a production-ready setup.

Recently, I’ve been experimenting with QuestDB as the primary time-series database to stream and analyze IoT/financial data:

While I was able to validate the power of QuestDB in storing massive amounts of data and querying them quickly in those two projects, I was mostly running them on my laptop via Docker. In order to scale my experiments, I wanted to create a more production-ready setup, including monitoring and disaster recovery on Kubernetes. So in this guide, we’ll walk through setting up QuestDB on GKE with Prometheus and Velero.

Prerequisites

Setting up GKE Autopilot

As a DevOps engineer/SRE, I’m a huge fan of GKE since it provides a lot of features out of the box such as cluster autoscaling, network policy plugins, and managed Istio compared to other managed Kubernetes options available. Recently Google Cloud announced GKE Autopilot, a new mode that further automates Kubernetes operations, including node management, security/hardening, and resource optimization. It brings together the serverless experience of Cloud Run with the flexibility and features of GKE. In practice, this means that you are now charged for pod usage rather than paying for the compute and storage of the underlying Kubernetes nodes, making it a great choice for projects with unknown resource utilization.

Creating a new GKE Autopilot cluster is also extremely simple. There’s no need to set up Terraform or VPCs/autoscalers/node groups. Simply create a Google Cloud account, navigate to Google Kubernetes Engine, enable the Kubernetes Engine API, and click on “Create Cluster”:

Click on the “Configure” button under “Autopilot mode”, give the cluster a name, and wait for a few minutes for the cluster to be ready. I went with the default name autopilot-cluster-1 in us-east1 region (feel free to skip the other sections for now).

To configure your Kubernetes context to interact with the cluster via Helm, fetch the credentials using the following gcloud command:

$ gcloud container clusters get-credentials autopilot-cluster-1 --region us-east1
Enter fullscreen mode Exit fullscreen mode

Deploying QuestDB

QuestDB provides an official Helm chart that deploys a single StatefulSet pod with 50Gi PVC by default. To install the chart, add the QuestDB Helm repo and deploy it by running the following:

$ helm repo add questdb [https://helm.questdb.io/](https://helm.questdb.io/)
$ helm upgrade --install questdb questdb/questdb
Enter fullscreen mode Exit fullscreen mode

Since GKE Autopilot does not pre-provision node capacity, the pod will be initially marked as unschedulable. Give it a few seconds until cluster autoscaler triggers a scaling up event and schedules the questdb pod.

For a quick smoke test, port-forward the HTTP endpoint and interact with the web console UI:

$ kubectl port-forward questdb-0 9000:9000
Enter fullscreen mode Exit fullscreen mode

Monitoring Postgres Metrics

At this point, we have a healthy, running instance of QuestDB. However, without inspecting the application logs, there is no easy way to determine the overall system health and grab metrics from QuestDB. For performance reasons, QuestDB team decided to decouple the REST endpoint from its min HTTP server that holds a simple health status on port 9003. This port is currently not exposed via the Helm chart, so the StatefulSet and Service sections need to be changed to allow probes to periodically check this endpoint.

Unfortunately, QuestDB currently does not provide a comprehensive metrics endpoint that natively integrates with Prometheus or Postgres. Open Github issues for this support is listed below:

Still, I wanted to deploy Prometheus Postgres Exporter to at least track QuestDB up/down status and set up the framework to monitor Postgres metrics once these features became available.

Expose Postgres Port

First, we need to modify the default QuestDB chart to expose port 8812 (Postgres endpoint). Create a new YAML file called questdb-values.yaml :

service:
  expose:
    postgresql:
      enabled: true
      port: 8812
Enter fullscreen mode Exit fullscreen mode

Update the existing deployment:

$ helm upgrade --install questdb questdb/questdb -f questdb-values.yaml
Enter fullscreen mode Exit fullscreen mode

Install Prometheus Postgres Exporter

Prometheus Helm charts are managed by the prometheus-community :

$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
Enter fullscreen mode Exit fullscreen mode

Next, we need to modify the default values to scrape QuestDB and also disable default metrics since QuestDB doesn’t currently store pg_stat_database or pg_stat_activity . We also want Prometheus to scrape our metrics, so we need to add annotations specify our scrape port.

Create another YAML file called exporter-values.yaml :

config:
  datasource:
    host: questdb-0.questdb-headless
    user: admin
    password: quest
    port: '8812'
    database: qdb
    sslmode: disable
  disableDefaultMetrics: true

annotations: 
  prometheus.io/scrape: 'true'
  prometheus.io/port: '9187'
Enter fullscreen mode Exit fullscreen mode

Now we can install Postgres Exporter:

$ helm upgrade -i postgres-exporter prometheus-community/prometheus-postgres-exporter -f exporter-values.yaml
Enter fullscreen mode Exit fullscreen mode

Finally, we can install Prometheus to collect these metrics. Create a new YAML file to hold our Prometheus configurations, prometheus-values.yaml :

alertmanager:
  enabled: false
nodeExporter:
  enabled: false
pushgateway:
  enabled: false
Enter fullscreen mode Exit fullscreen mode

Since we don’t have real metrics from QuestDB yet, we will only deploy the Prometheus server and accept the default configurations.

$ helm upgrade -i prometheus prometheus-community/prometheus -f prometheus-values.yaml
Enter fullscreen mode Exit fullscreen mode

Access the dashboard to run our queries by port-forwarding:

$ export POD_NAME=$(kubectl get pods --namespace default -l \ "app=prometheus,component=server" -o jsonpath="{.items[0].metadata.name}")
  kubectl --namespace default port-forward $POD_NAME 9090
Enter fullscreen mode Exit fullscreen mode

Open up localhost:9090 and check that we’re able to scrape pg_up :

(Note: If you are looking for a production-ready monitoring setup, please refer to the four-part series “Practical Monitoring with Prometheus and Grafana”.)

Automated Backups with Velero

Velero is an open-source tool to back up and restore Kubernetes resources and persistent volumes. This is useful for disaster recovery (taking snapshots) or for data migration. Velero runs inside the Kubernetes cluster and integrates with various storage providers (e.g. AWS S3, GCP Storage, Minio) as well as restic to take snapshots either on-demand or on a schedule.

Installation

Velero can be installed via Helm or via the CLI tool. In general, it seems like the CLI gets the latest updates, and the Helm chart lags behind slightly with compatible Docker images.

# for MacOS
$ brew install velero

# for Windows
$ choco install velero
Enter fullscreen mode Exit fullscreen mode

Configuring Server Components

To set up Velero on GCP, we need to:

Create an GCS Bucket

Give a unique bucket name and use the gsutil tool to create the bucket (replace with the name of your bucket):

PROJECT_ID=$(gcloud config get-value project)
BUCKET=<YOUR_BUCKET>

gsutil mb gs://$BUCKET/
Enter fullscreen mode Exit fullscreen mode

Set GCP Permissions

Create a service account:

gcloud iam service-accounts create velero \
 — display-name “Velero service account”
Enter fullscreen mode Exit fullscreen mode

Attach the compute permissions to the service account:

SERVICE_ACCOUNT_EMAIL=$(gcloud iam service-accounts list \
  --filter="displayName:Velero service account" \
  --format 'value(email)')

ROLE_PERMISSIONS=(
    compute.disks.get
    compute.disks.create
    compute.disks.createSnapshot
    compute.snapshots.get
    compute.snapshots.create
    compute.snapshots.useReadOnly
    compute.snapshots.delete
    compute.zones.get
)

gcloud iam roles create velero.server \
    --project $PROJECT_ID \
    --title "Velero Server" \
    --permissions "$(IFS=","; echo "${ROLE_PERMISSIONS[*]}")"

gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member serviceAccount:$SERVICE_ACCOUNT_EMAIL \
    --role projects/$PROJECT_ID/roles/velero.server

gsutil iam ch serviceAccount:$SERVICE_ACCOUNT_EMAIL:objectAdmin gs://${BUCKET}
Enter fullscreen mode Exit fullscreen mode

Download the service account key and save it as credential-velero :

gcloud iam service-accounts keys create credentials-velero \
    --iam-account $SERVICE_ACCOUNT_EMAIL
Enter fullscreen mode Exit fullscreen mode

Now we can install Velero with the GCS plugin enabled:

velero install \
    --provider gcp \
    --plugins velero/velero-plugin-for-gcp:v1.1.0 \
    --bucket $BUCKET \
    --secret-file ./credentials-velero
Enter fullscreen mode Exit fullscreen mode

Finally, we can create a schedule using cron string:

velero schedule create questdb --schedule "0 7 * * *" -l "app.kubernetes.io/instance=questdb" --include-namespaces default
Enter fullscreen mode Exit fullscreen mode

We can verify backups being made in our bucket:

(Note: For a deep-dive on Velero, refer to “Disaster Recovery on Kubernetes.”)

Next Steps

To optimize for performance, QuestDB opted to build a time-series database from scratch with ANSI SQL compatibility, as opposed to building on top of Postgres like TimescaleDB. Also, as a newer product, QuestDB is missing some enterprise features (e.g. replication, high-availability, recovery from snapshot) and integrations to other popular projects. Still, with Prometheus Postgres exporter and Velero, we can configure a production-ready QuestDB on Kubernetes. I look forward to enhancing this setup in future releases when monitoring and ops features will be supported.

Top comments (0)