DEV Community

richamishra006
richamishra006

Posted on • Updated on

Postgres data migration from VM server to Kubernetes

PostgreSQL is a powerful, open-source object-relational database system that safely stores and scales data workloads. It uses SQL and is widely used due to its reliability, data integrity, robustness, and many more features. While the world is moving towards Kubernetes to run their entire application ecosystem, migrating the stateful side of the ecosystem to Kubernetes is still not as straightforward. One has to be very sure of how without any data loss and downtime, we can migrate the databases to Kubernetes.

Being one of the widely used and popular relational databases, PostgreSQL DB can run on various platforms such as macOS, Windows, Linux, etc. We can set up PostgreSQL on virtual machines, on cloud or physical machines or use containerized databases .

In this blog post, we will be discussing a scenario, where we will learn how to migrate PostgreSQL data from a virtual machine to Kubernetes setup . While there are various ways to install postgres on Kubernetes, the best approach is to always look for an operator which can take away most of the operational and administration burden. One such operator is Kubergres, which we are going to use due to its simplicity and features. To know more about Kubegres, check out their official documentation

What is Kubegres? [Postgres operator for k8s]

Kubegres is a Kubernetes operator which helps us to deploy clusters of PostgreSQL instances with data replication and failover enabled out of the box. It brings simplicity when using PostgreSQL with Kubernetes.

Features:

It provides data replication, and replicates data from primary PostgreSQL instance to replica instances in real time
It manages failover by promoting the replica instance as primary PostgreSQL in case of failure
It has option to schedule backup of database and dump the data in a separate persistent volume

Having learnt about the features of Kubgres, let's proceed with the installation.

Prerequisites

One needs to have a Postgres database installed and running on a virtual machine. Also, access to a Kubernetes cluster will be required where we'll be migrating the database to.

Installation of Kubegres

Kubegres is an open source tool. We can deploy it on Kubernetes by using the Kubegres operator

Install Kubegres operator

Run the following command in a Kubernetes cluster:

kubectl apply -f https://raw.githubusercontent.com/reactive-tech/kubegres/v1.15/kubegres.yaml
Enter fullscreen mode Exit fullscreen mode

After executing the above command, a kubegres-system namespace will be created, where the controller will be installed. we will see a deployment running in the kubegres-system namespace.

We will see the below output

namespace/kubegres-system created
customresourcedefinition.apiextensions.k8s.io/kubegres.kubegres.reactive-tech.io created
serviceaccount/kubegres-controller-manager created
role.rbac.authorization.k8s.io/kubegres-leader-election-role created
clusterrole.rbac.authorization.k8s.io/kubegres-manager-role created
clusterrole.rbac.authorization.k8s.io/kubegres-metrics-reader created
clusterrole.rbac.authorization.k8s.io/kubegres-proxy-role created
rolebinding.rbac.authorization.k8s.io/kubegres-leader-election-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/kubegres-manager-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/kubegres-proxy-rolebinding created
configmap/kubegres-manager-config created
service/kubegres-controller-manager-metrics-service created
deployment.apps/kubegres-controller-manager created

Enter fullscreen mode Exit fullscreen mode

Create a Secret resource

Before creating a cluster for PostgreSQL, we will create a secret with the password of Postgres user, replication user and production database user. We will be using this later in the PostgreSQL configuration

Create a file:

vi postgresql-secret.yaml
Enter fullscreen mode Exit fullscreen mode

Add the following contents:

apiVersion: v1
kind: Secret
metadata:
  name: postgres-production-secret
  namespace: default
type: Opaque
data:
  superUserPassword: S2p4dVhpUmlLVWNBVj0=
  replicationUserPassword: cmVwbGljYXRpb24=
  password: cHJvZHVjdGlvbg==
Enter fullscreen mode Exit fullscreen mode

Apply the changes:

kubectl apply -f postgresql-secret.yaml
Enter fullscreen mode Exit fullscreen mode

Create a ConfigMap

Next we will create a configmap,as Kubegres allows us to override its default configurations and bash scripts based on our requirements. The base configmap contains the default configs of all Kubegres resources in that namespace.
We can override the following configurations by creating our configmap:
postgres.conf- the official PostgreSQL configs used for both primary and replica servers.
pg_hba.conf- the host-based authentication configs for both primary and replica servers.
primary_init_script.sh- a bash script which runs for the first time when a primary PostgreSql container is created. Here we can add instructions to create custom databases.
backup_database.sh- this bash script defines the actions to perform during a backup. It is executed regularly by a dedicated cronjob

Create a file:

vi postgres-configmap.yaml
Enter fullscreen mode Exit fullscreen mode

Add the following contents:

apiVersion: v1
kind: ConfigMap
metadata:
  name: postgres-configmap
  namespace: default
data:

  postgres.conf: |2
    # Replication configs
    listen_addresses = '*'
    max_wal_senders = 10
    max_connections = 5000
    shared_buffers = 128MB
    wal_level= logical
    log_min_duration_statement= 5000


    # Logging
    log_destination = 'stderr,csvlog'
    logging_collector = on
    log_directory = 'pg_log'
    log_filename= 'postgresql-%Y-%m-%d_%H%M%S.log'


  pg_hba.conf: |

    # TYPE  DATABASE        USER            ADDRESS                 METHOD
    # Replication connections by a user with the replication privilege
    host    replication     replication     all                     md5
    # As long as it is authenticated, all connections allowed except from "0.0.0.0/0"
    local   all             all                                     md5
    host    all             all             all                     md5
    host    all             all             0.0.0.0/0               reject
    host production production 127.0.0.1/32 trust
    host production production ::1/128 trust
    host production production 0.0.0.0/0 trust
    host production postgres 127.0.0.1/32 trust

  primary_init_script.sh: |
    #!/bin/bash
    set -e

    # This script assumes that the env-var $POSTGRES_MY_DB_PASSWORD contains the password of the custom user to create.
    # You can add any env-var in your Kubegres resource config YAML.

    dt=$(date '+%d/%m/%Y %H:%M:%S');
    echo "$dt - Running init script the 1st time Primary PostgreSql container is created...";

    customDatabaseName="production"
    customUserName="production"

    echo "$dt - Running: psql -v ON_ERROR_STOP=1 --username $POSTGRES_USER --dbname $POSTGRES_DB ...";

    psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" --dbname "$POSTGRES_DB" <<-EOSQL
    CREATE DATABASE $customDatabaseName;
    CREATE USER $customUserName WITH PASSWORD '$POSTGRES_PRODUCTION_PASSWORD';
    GRANT ALL PRIVILEGES ON DATABASE "$customDatabaseName" to $customUserName;
    EOSQL

    echo "$dt - Init script is completed";

  backup_database.sh: |
    #!/bin/bash
    set -e

    dt=$(date '+%d/%m/%Y %H:%M:%S');
    fileDt=$(date '+%d_%m_%Y_%H_%M_%S');
    backUpFileName="$KUBEGRES_RESOURCE_NAME-backup-$fileDt.gz"
    backUpFilePath="$BACKUP_DESTINATION_FOLDER/$backUpFileName"

    echo "$dt - Starting DB backup of Kubegres resource $KUBEGRES_RESOURCE_NAME into file: $backUpFilePath";
    echo "$dt - Running: pg_dumpall -h $BACKUP_SOURCE_DB_HOST_NAME -U postgres -c | gzip > $backUpFilePath"

    pg_dumpall -h $BACKUP_SOURCE_DB_HOST_NAME -U postgres -c | gzip > $backUpFilePath

    if [ $? -ne 0 ]; then
      rm $backUpFilePath
      echo "Unable to execute a BackUp. Please check DB connection settings"
      exit 1
    fi

    echo "$dt - DB backup completed for Kubegres resource $KUBEGRES_RESOURCE_NAME into file: $backUpFilePath";

Enter fullscreen mode Exit fullscreen mode

Apply the changes:

kubectl apply -f postgres-configmap.yaml
Enter fullscreen mode Exit fullscreen mode

In order to create a cluster of PostgreSQL, we need to create a Kubegres resource. We will be creating a file “kubegres.yaml” with the below content. This yaml file will create one primary PostgreSql pod and two replica PostgreSql pods. The data will be replicated in real time from the Primary pod to the 2 Replica pods.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgresql-backup-pvc
  namespace: default
spec:
  accessModes:
  - ReadWriteOnce
  storageClassName: standard-lrs
  resources:
    requests:
      storage: 35Gi

---
apiVersion: kubegres.reactive-tech.io/v1
kind: Kubegres
metadata:
  name: production-postgresql
  namespace: default
  annotations:
    imageregistry: "https://hub.docker.com/"
    nginx.ingress.kubernetes.io/affinity: cookie

spec:
  replicas: 3
  image: postgres:13.2
  port: 5432

  database:
     size: 50G
     storageClassName: standard-lrs
     volumeMount: /var/lib/postgresql/data

  customConfig: postgres-configmap

  failover:
     isDisabled: false

  backup:
     schedule: "0 */2 * * *"
     pvcName: postgresql-backup-pvc
     volumeMount: /var/lib/backup

  env:
     - name: POSTGRES_PRODUCTION_PASSWORD
       valueFrom:
         secretKeyRef:
           name: postgres-production-secret
           key: password
     - name: POSTGRES_PASSWORD
       valueFrom:
          secretKeyRef:
             name: postgres-production-secret
             key: superUserPassword

     - name: POSTGRES_REPLICATION_PASSWORD
       valueFrom:
          secretKeyRef:
             name: postgres-production-secret
             key: replicationUserPassword


Enter fullscreen mode Exit fullscreen mode

In the above file, for creating a persistent volume claim, use the storage class as per your setup ( replace standard-lrs with the storage class available in your cluster). Also, you can update the image version of Postgres with the latest image.

Apply the changes:

kubectl apply -f kubegres.yaml
Enter fullscreen mode Exit fullscreen mode

Once we run the above command, three replicas of PostgreSQL will spin up. In our file, we have also scheduled a database backup of the database, which will be stored in a separate persistent volume.

kubectl get pods
NAME                        READY   STATUS    RESTARTS   AGE
production-postgresql-1-0   1/1     Running   0          2m31s
production-postgresql-2-0   1/1     Running   0          2m20s
production-postgresql-3-0   1/1     Running   0          2m6s
Enter fullscreen mode Exit fullscreen mode

Once we have our Kubegres pods up and running, we will create a sample database with few tables in the PostgreSQL VM and try to migrate it to the Kubegres setup in Kubernetes.

Sample Database and data on Postgres(VM) server

Considering PostgreSQL server is already running on a virtual machine, we will create a database “production”, insert some data into it, and then verify the same after migration is done.

First, we will check if the Postgres service is up and running:

postgres@richa-mishra:~$ systemctl status postgresql
● postgresql.service - PostgreSQL RDBMS
     Loaded: loaded (/lib/systemd/system/postgresql.service; enabled; vendor preset: enabled)
     Active: active (exited) since Mon 2022-07-25 08:51:59 IST; 9h ago
    Process: 1233 ExecStart=/bin/true (code=exited, status=0/SUCCESS)
   Main PID: 1233 (code=exited, status=0/SUCCESS)
Enter fullscreen mode Exit fullscreen mode

Now we will create a database called “production”. Switch to user postgres and run:

psql
Enter fullscreen mode Exit fullscreen mode

We will log in into the database with postgres user and \create a database:

postgres=# CREATE DATABASE production;
Enter fullscreen mode Exit fullscreen mode

Let's connect to this production database and insert some data:

postgres=# \c production
You are now connected to database "production" as user "postgres".
Enter fullscreen mode Exit fullscreen mode

create a table and add some data into it:

production=#CREATE TABLE employee (
    employee_id INT PRIMARY KEY,
    first_name VARCHAR (255) NOT NULL,
    last_name VARCHAR (255) NOT NULL,
    manager_id INT,
    FOREIGN KEY (manager_id) 
    REFERENCES employee (employee_id) 
    ON DELETE CASCADE
);

production=#INSERT INTO employee (
    employee_id,
    first_name,
    last_name,
    manager_id
)
VALUES
    (1, 'Sandeep', 'Jain', NULL),
    (2, 'Abhishek ', 'Kelenia', 1),
    (3, 'Harsh', 'Aggarwal', 1),
    (4, 'Raju', 'Kumar', 2),
    (5, 'Nikhil', 'Aggarwal', 2),
    (6, 'Anshul', 'Aggarwal', 2),
    (7, 'Virat', 'Kohli', 3),
    (8, 'Rohit', 'Sharma', 3);
Enter fullscreen mode Exit fullscreen mode

Output:

production=# SELECT * FROM employee;
 employee_id | first_name | last_name | manager_id 
-------------+------------+-----------+------------
           1 | Sandeep    | Jain      |           
           2 | Abhishek   | Kelenia   |          1
           3 | Harsh      | Aggarwal  |          1
           4 | Raju       | Kumar     |          2
           5 | Nikhil     | Aggarwal  |          2
           6 | Anshul     | Aggarwal  |          2
           7 | Virat      | Kohli     |          3
           8 | Rohit      | Sharma    |          3
(8 rows)

Enter fullscreen mode Exit fullscreen mode

Now we have a database called production, a table and some data in it. After migration we will verify if the same data exists on Kubegres. For performing data migration, we will be performing a few steps for establishing network connectivity between PostgreSQL on VM and Kubegres on Kubernetes. Once the connectivity is established, we will be running a job which will dump the data from the VM and restore it to the database on Kubernetes. Lets perform this activity step by step as explained below.

Migration of data from Postgres Server to Kubegres:

We will edit pg_hba.conf file of PostgreSQL(VM) and add an entry to allow connections from Kubegres, basically will try to establish an authentication between kubegres and Postgres on VM
So switch to postgres user and run the below command:

vim /etc/postgresql/13/main/pg_hba.conf
Enter fullscreen mode Exit fullscreen mode

In my case the version of PostgreSQL(VM) is 13, so the folder name is 13.

We will add the following line having IP address of Kubegres:

host    production     all    172.18.0.2/32         trust

Enter fullscreen mode Exit fullscreen mode

After making changes to pg_hba.conf, make sure to restart the postgresql service

systemctl restart postgresql
Enter fullscreen mode Exit fullscreen mode

Next we will create a Service and an endpoint to establish connectivity between Postgres setup on VM and Kubernetes.
Services when used with a corresponding Endpoints object and without a selector can help us to connect the Postgres on VM ( backends outside the cluster). So we will create a service and an endpoint object manually.
When you create an Endpoints object for a Service, you set the name of the endpoint to be the same as that of the Service.
In the Endpoint, under addresses we will mention the IP address of Postgres(VirtualMachine)

vi external-service.yaml
Enter fullscreen mode Exit fullscreen mode

Add the below contents:

apiVersion: v1
kind: Service
metadata:
  name: pg
spec:
  clusterIP: None
  ports:
  - name: postgres
    port: 5432
    targetPort: 5432
  selector: null
---
apiVersion: v1
kind: Endpoints
metadata:
  name: pg
subsets:
- addresses:
  - ip: 192.168.1.41
  ports:
  - name: postgres
    port: 5432
    protocol: TCP
Enter fullscreen mode Exit fullscreen mode
kubectl apply -f external-service.yaml
Enter fullscreen mode Exit fullscreen mode

Now, we will create a configmap, which will contain all details of our setup as follows:
DB: Database name which needs to be migrated. We have already created a new database on Kubernetes setup with the same name “production” at the time of installation.
USERNAME: username/role having access of the DB
PASSWORD: password of the username
SRC: It refers to the host name of Postgres(VM), so here we will add the external service name and namespace which we created in above step
DST: It refers to the hostname of Kubegres setup

vi migration-cm.yaml
Enter fullscreen mode Exit fullscreen mode
apiVersion: v1
kind: ConfigMap
metadata:
  name: postgres-migration-cm
  labels:
    app: production-postgresql
data:
    DB: production
    USER: postgres
    PGPASSWORD: KjxuXiRiKUcAV=
    #     svcname.namespace
    SRC: "pg.default"
    #     podname.servicename[.namespace]
    DST: "production-postgresql-1-0.production-postgresql"

Enter fullscreen mode Exit fullscreen mode
kubectl apply -f migration-cm.yaml
Enter fullscreen mode Exit fullscreen mode

Since we have all the details of source and destination database in the configmap which we created above, we will run the migration job
Create a file:

vim job.yaml
Enter fullscreen mode Exit fullscreen mode

Add the below contents:

apiVersion: batch/v1
kind: Job
metadata:
  name: populate-db
spec:
  backoffLimit: 0
  template:
    spec:
      restartPolicy: Never
      containers:
      - name: pg
        image: crunchydata/crunchy-postgres:centos8-13.5-4.7.4
        command: ["/bin/sh"]
        args:
        - -c
        - "pg_dump -h $SRC -U $USER --format c $DB | pg_restore --verbose -h $DST -U $USER --format c --dbname $DB"
        envFrom:
        - configMapRef:
            name: postgres-migration-cm

Enter fullscreen mode Exit fullscreen mode
kubectl apply -f job.yaml
Enter fullscreen mode Exit fullscreen mode

Once we run the above command, a job will be created and a container will start spinning, We will see the output as shown below:

kubectl get pods | grep populate
NAME                        READY   STATUS      RESTARTS      AGE
populate-db-txgkt           0/1     Completed   0             2s
Enter fullscreen mode Exit fullscreen mode

we will check the logs of this container. If there is some issue with the connectivity or any errors, you will be able to see them in the logs.

kubectl logs populate-db-txgkt
pg_restore: connecting to database for restore
pg_restore: creating TABLE "public.employee"
pg_restore: processing data for table "public.employee"
pg_restore: creating CONSTRAINT "public.employee employee_pkey"
pg_restore: creating FK CONSTRAINT "public.employee employee_manager_id_fkey"
Enter fullscreen mode Exit fullscreen mode

Here we can see in the logs that database connectivity and restore is successful.

To verify if the same table and data has been created in the Kubegres pod, we will exec into the pod and verify the data.

kubectl exec -it production-postgresql-1-0 bash
root@production-postgresql-1-0:/# su postgres
postgres@production-postgresql-1-0:/$ cd
postgres@production-postgresql-1-0:~$ psql -d production
Password for user postgres: 
psql (14.1 (Debian 14.1-1.pgdg110+1))
Type "help" for help.
production=# \dt
          List of relations
 Schema |   Name   | Type  |  Owner   
--------+----------+-------+----------
 public | employee | table | postgres
(1 row)

production=#  SELECT * FROM employee;
 employee_id | first_name | last_name | manager_id 
-------------+------------+-----------+------------
           1 | Sandeep    | Jain      |           
           2 | Abhishek   | Kelenia   |          1
           3 | Harsh      | Aggarwal  |          1
           4 | Raju       | Kumar     |          2
           5 | Nikhil     | Aggarwal  |          2
           6 | Anshul     | Aggarwal  |          2
           7 | Virat      | Kohli     |          3
           8 | Rohit      | Sharma    |          3
(8 rows)
Enter fullscreen mode Exit fullscreen mode

We will execute the above commands, step by step.
First we will exec into the pod by running:

kubectl exec -it production-postgresql-1-0 bash
Enter fullscreen mode Exit fullscreen mode

Then we will switch to user postgres and type psql -d production to connect to the production database. Once connected after providing the right credentials, we will be running the same query which we ran in the sample database section, to verify the consistency of data.
SELECT * FROM employee;
This query lists the same data which we have inserted in Postgres VM.

So the data has been transferred successfully. Now based on your setup, you can switch the application connectivity from Postgres VM to Kubegres setup which will basically require a merge request with few changes, once the MR is merged the application will start connecting to the kubegres setup, without any downtime and without any data loss.

Conclusion

These were the few steps to transfer the data of a database from a Virtual Machine to Kubernetes setup. Using these steps, we could migrate a production database in VM to kubernetes setup with zero downtime and data loss.
This blog post can be helpful even if you use any other database instead of PostgreSQL and are looking for a migration solution from a VM to Kubernetes setup.
To understand more about Kubegres, Please refer to their official doc and Github page.

That’s all for this post. If you are working on a similar scenario and need some assistance, feel free to reach out to me via LinkedIn, you can also read other blog posts that I’ve written on InfraCloud. I’m always excited to hear thoughts and feedback from my readers!

Top comments (0)