In the previous article, we explored ways to handle pod updates without affecting availability, using Deployments.
This article will cover stateful applications in Kubernetes and how StatefulSets fit in such scenario. Moreover, you'll have the chance to understand how volumes work in Kubernetes and how they relate to Pods hence Deployments and StatefulSets.
Let's start the journey.
When working with containers, more precisely Pods, it's known that their data is ephemeral, which means all data written to the Pod will live during the Pod lifetime only.
Once the Pod is terminated, all its data is lost.
That's the essence of stateless applications.
🔵 Stateless applications are the default
By default, all applications in Kubernetes are stateless, meaning that data within the Pod are ephemeral and will be permanently lost during an application rollout update.
For instance, suppose we have a PostgreSQL Deployment:
apiVersion: apps/v1 kind: Deployment metadata: name: pg spec: replicas: 1 selector: matchLabels: app: pg template: metadata: labels: app: pg spec: containers: - name: postgresql image: postgres:14 env: - name: POSTGRES_USER value: postgres - name: POSTGRES_PASSWORD value: postgres
Once it's running, we can create a table called
users in the database:
$ kubectl exec pg-79d96fb7b7-zg9kl -- \ psql -U postgres -c \ "CREATE TABLE users (id SERIAL, name VARCHAR);"
And running the query afterwards:
$ kubectl exec pg-79d96fb7b7-zg9kl -- \ psql -U postgres -c "SELECT * FROM users;" id | name ----+------ (0 rows)
👉 Rolling out the application
Not rare, we have to update the application pod, either fixing some bug, updating the database version or doing some maintenance.
$ kubectl rollout restart deploy/pg deployment.apps/pg restarted
Notice the the pod name has changed, because it's a deployment, and Deployments have no ordering or identity for differentiation.
Let's perform the query on this new Pod:
$ kubectl exec pg-8486b4f555-5dqz8 -- \ psql -U postgres -c "SELECT * FROM users;" ERROR: relation "users" does not exist LINE 1: SELECT * FROM users;
Uh, oh...the table has gone away. Pods are stateless, remember?
🔵 Stateful applications
If we want to build a stateful application in Kubernetes, we have to share a common persistent structure that can be mounted across different pods of the same replicaset.
Enter Persistent Volumes.
👉 VolumeMounts and Volumes
In order to use persistent volumes, we have to mount a volume in the Pod container spec:
kind: Deployment ... # more spec: template: spec: containers: - name: postgresql image: postgres:14 env: - name: POSTGRES_USER value: postgres - name: POSTGRES_PASSWORD value: postgres volumeMounts: - name: pgdata mountPath: /var/lib/postgresql/data
Here, the volume is described as
pgdata, which will be mounted to the path
/var/lib/postgresql/data in the container. This path is exactly where the PostgreSQL data is located.
However, the volume
pgdata can't come from nowhere. We need to request a persistent volume in the underlying infrastructure storage.
By infrastructure, we could think of our host machine in development, a server in the production environment or even a product storage by the underlying cloud-provider if that's the case.
template.spec section, we add the
... spec: template: spec: containers: ... volumes: - name: pgdata persistentVolumeClaim: claimName: my-pvc
Persistent Volume Claim, or PVC, is a request by the user for some piece of storage. In the above example, we assume that we have a PVC called
my-pvc, let's create it then:
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: my-pvc spec: storageClassName: my-sc volumeName: my-pv accessModes: - ReadWriteOnce resources: requests: storage: 1Gi
The PVC requires some attributes:
- storageClassName: it's a class of storage defined by the administrator of the cluster. Storage class holds traits about policies and other storage services of the cluster. We'll create it soon.
- volumeName: the persistent volume, which is a piece of storage that can be statically of dynamically provisioned in the cluster
- accessModes, resources among others...
First, we have to create the
kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: my-sc provisioner: kubernetes.io/no-provisioner parameters: type: local
Provisioner determines the plugin used to control the storage provisioning in the cluster.
In development, we can use the default
kubernetes.io/no-provisioner, which will not request storage dynamically, so we have to declare the persistent volume manually.
The Persistent Volume is a piece of storage in the underlying infrastructure.
By defining capacity, storageClass, accessMode and hostPath, we can declare such a piece ready to be used by a PVC in a Pod.
apiVersion: v1 kind: PersistentVolume metadata: name: my-pv spec: storageClassName: my-sc capacity: storage: 1Gi accessModes: - ReadWriteOnce hostPath: path: /data/volumes/my-pv
Once we applied SC, PV and PVC, we can apply the Deployment using the PVC
apiVersion: apps/v1 kind: Deployment metadata: name: pg spec: replicas: 1 selector: matchLabels: app: pg template: metadata: labels: app: pg spec: containers: - name: postgresql image: postgres:14 env: - name: POSTGRES_USER value: postgres - name: POSTGRES_PASSWORD value: postgres volumeMounts: - name: pgdata mountPath: /var/lib/postgresql/data volumes: - name: pgdata persistentVolumeClaim: claimName: my-pvc
$ kubectl get pods NAME READY STATUS RESTARTS AGE pg-7744b4d548-nxf8v 1/1 Running 0 3s
Now, time to check if the volumes are working properly across rollout updates:
### CREATE TABLE $ kubectl exec pg-7744b4d548-nxf8v -- psql -U postgres -c "CREATE TABLE users (id SERIAL, name VARCHAR);" ### QUERY $ kubectl exec pg-7744b4d548-nxf8v -- psql -U postgres -c "SELECT * FROM users" ### ROLLOUT $ kubectl rollout restart deploy/pg
And then, performing the query against the new Pod:
$ kubectl exec pg-558d58c54-n9zb2 -- psql -U postgres -c "SELECT * FROM users" id | name ----+------ (0 rows)
Yay! We just created a stateful application using Deployment and Persistent Volumes!
🔵 Scaling up stateful applications
At this moment, our Deployment has 1 replica only, but if we want to achieve high availability, we have to configure our deployment to support more replicas.
Let's scale up to 3 replicas as we learned in the previous article. It's easy as doing:
$ kubectl scale deploy/pg --replicas=3 deployment.apps/pg scaled $ kubectl get pods NAME READY STATUS RESTARTS AGE pg-9668885c9-rt9fd 1/1 Running 0 64s pg-9668885c9-dqwcc 1/1 Running 0 63s pg-9668885c9-kt7dg 1/1 Running 1 (5s ago) 66s
After several rollout updates, we may end up with the following state:
$ kubectl get pods NAME READY STATUS RESTARTS AGE pg-55488bc8b6-wvr66 0/1 CrashLoopBackOff 1 (5s ago) 9s pg-55488bc8b6-x4hh2 0/1 CrashLoopBackOff 1 (5s ago) 9s pg-55488bc8b6-hvsdb 0/1 Error 2 (16s ago) 19s
💥 Oh my...the application has gone away.💥
There's no healthy Pod left. The entire Deployment is broken. What happened here?
👉 Deployment replicas share the same PVC
All Pod replicas in the deployment are sharing the same PVC. Due to concurrency issues, when there are two Pods writing to the same location, it can lead to data loss or corruption.
After several rollouts, it's not rare that our deployment will end up in a broken state like above.
- Deployments don't guarantee ordering during updates, which can lead to data inconsistency
- Deployments don't provide any kind of identity, like a stable hostname or IP address for the Pods, which can cause reference issues
Hence, despite it's possible, Deployments are not a good fit for stateful applications.
Thankfully, Kubernetes addresses thoses problems by providing another workload object called StatefulSet.
The StatefulSet object brings a StatefulSet Controller that acts like the Deployment Controller, but with some differences:
- they have an identity, addressing reference issues
- StatefulSets guarantee ordering of updates, thus avoiding data inconsistency
- Pod replicas in a StatefulSet do not share the same PVC. Each replica has its own PVC
We'll follow the same process as for the Deployment, but referencing
kind: StatefulSet instead:
apiVersion: apps/v1 kind: StatefulSet metadata: name: pg spec: replicas: 3 selector: matchLabels: app: pg template: metadata: labels: app: pg spec: containers: - name: postgresql image: postgres:14 env: - name: POSTGRES_USER value: postgres - name: POSTGRES_PASSWORD value: postgres volumeMounts: - name: pvc mountPath: /var/lib/postgresql/data volumeClaimTemplates: - metadata: name: pvc spec: accessModes: [ "ReadWriteOnce" ] storageClassName: "local-path" resources: requests: storage: 1Gi
Note that the
containers.volumeMounts keep the same, as it needs to reference the volume declared in the template.
But the persistent volume will be created dynamically using the attribute
volumeClaimTemplates, where we just have to define the storageClassName and storage request.
Wait...why are we using
local-path in the storageClassName?
👉 Dynamic Provisioning
In order to create persistent volumes dynamically, we can't use the storageClass we created previously, because it uses a provisioner called
no-provisioner which does not allow to provision volumes dynamically.
Instead, we can use other storageClass. Chances are that you have a default storageClass created on your cluster.
In my example, I created the k8s cluster using colima, so it already has created a default storage class that allows dynamic provisioning.
Go check your cluster and choose the default storageClass created by it.
$ kubectl get sc NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE local-path (default) rancher.io/local-path Delete WaitForFirstConsumer false 37d
local-path is the name of the default storageClass, which allows dynamic provisioning.
After applying the StatefulSet, we can check that we have 3 replicas running. This time, the name of the pods follow a ordering number:
$ kubectl get pods NAME READY STATUS RESTARTS AGE pg-0 1/1 Running 0 62s pg-1 1/1 Running 0 32s pg-2 1/1 Running 0 25s
Also, confirm that we have 3 different PVC's:
$ kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE pvc-pg-0 Bound pvc-f69c6af0-fc14-4e19-9c98-68ddbd69cbb5 1Gi RWO local-path 86s pvc-pg-1 Bound pvc-ebee5b7f-2568-4c30-8e89-f34099036d0d 1Gi RWO local-path 56s pvc-pg-2 Bound pvc-08039a2b-d6a8-4777-a9d3-72c7b8860eea 1Gi RWO local-path 49s
And lastly, that we provisioned dynamically 3 persistent volumes, one for each replica:
$ kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-f69c6af0-fc14-4e19-9c98-68ddbd69cbb5 1Gi RWO Delete Bound default/pvc-pg-0 local-path 116s pvc-ebee5b7f-2568-4c30-8e89-f34099036d0d 1Gi RWO Delete Bound default/pvc-pg-1 local-path 87s pvc-08039a2b-d6a8-4777-a9d3-72c7b8860eea 1Gi RWO Delete Bound default/pvc-pg-2 local-path 81s
Such a big Yay! 🚀
Now, we can scale up, down or perform rollout updates as many times as we want, scaling issues with stateful apps are gone!
🚀 Wrapping Up
Today we learned how to build stateful applications in Kubernetes using persistent volumes and how Deployments can lead to issues while scaling stateful applications.
We've seen how StatefulSets are the best solution for this problem, by keeping identity and ordering during updates, avoiding data inconsistency.
Stay tuned, as the upcoming posts well continue to cover more workload resources in Kubernetes, such as DaemonSets, Jobs and CronJobs.
Top comments (0)