This article will contain a series of notes I made while preparing for the CKAD exam.
I intend to use these notes to refresh my knowledge on the topic periodically and hope fellow engineers find them useful during their foray into Kubernetes.
Please come back to read them periodically as I will add new notes as I work through the preparation material.
Let's dive straight in!
systemd - system startup and service management (replaces service package)
journald - manages system logs (this is a systemd service that collects and stores logging data - it creates and maintains structures, indexed journals based on logging information that is received from a variety of sources.
firewalld - firewall management daemon (provides a dynamically managed firewall with support for network / firewall zones to define the trust level of network connections or interfaces - it has support for IPv4, IPv6 firewall settings and Ethernet bridges.
ip - network display and configuration tool (this is part of the networking tools package, and is designed to be a replacement for the ifconfig command. The ip command will show or manipulate routing, network devices, routing information and tunnels.
...an open source system for automating deployment, scaling and management of containerised applications.
- started at Google 15 years ago
- containers provide a fine-grained solution for packing clusters efficiently
Kubernetes in greek means 'the helmsman' or pilot of the ship
continuous integration - a consistent way to build and test software. Deploying new packages with code written each day or every hour instead of quarterly. Tools like Helm or Jenkins are often part of this with Kubernetes.
continuous delivery - an automated way to test and deploy software into various environments. Kubernetes handles the lifecycle of containers and connection of infrastructure resources to make rolling upgrades and rollbacks easy.
Kubernetes approaches these software issues by deploying a large number of small web services called microservices. One way of visualising this is - instead of deploying a large Apache web server running httpd daemons responding to page requests, there would be many smaller nginx servers handling traffic and requests.
- kubectl - is used to talk to the API server (or we can write our own client)
- kube-scheduler - sees the requests for running containers coming to the API and finds a suitable node to run that container on
- each node in a cluster runs two processes - a kubelet and kube-proxy. A kubelet receives requests to run containers, manages any necessary resources and watches over them on the local node. A kube-proxy creates and manages networking rules to expose containers on the network.
Orchestration is managed through a set of watch-loops knows as operators or controllers.
Each controller interrogates the kube-apiserver for a particular object state, modifying the object until the declared state matches the current state.
a Pod is a group of one or more containers (such as Docker containers), with shared storage/network, and a specification for how to run the containers. A Pod’s contents are always co-located and co-scheduled, and run in a shared context
a Deployment provides declarative updates for Pods and ReplicaSets. You describe a desired state in a Deployment, and the Deployment Controller changes the actual state to the desired state at a controlled rate
a ReplicaSet is a controller which deploys and restarts Docker containers until the requested number of containers is running
a DaemonSet ensures that a single pod is deployed on every node (often used for logging and metrics)
a StatefulSet can be used to deploy pods in a particular order, such that following pods are only deployed if previous pods report a ready status
we can use labels which are part of object metadata
nodes can have taints to ensure pods are not scheduled on inappropriate nodes unless the pod has a toleration in its metadata
kubectl describe nodes | grep -i taint Taints: node-role.kubernetes.io/master:NoSchedule
This means that no pod will be able to schedule onto this node.
- Annotations are used by third party agents and other tools nut not by Kubernetes
runs various server and manager processes for the cluster
kube-apiserver, kube-scheduler and etcd database are all components of the master node
Kube-apiserver is central to the operations of a kubernetes cluster.
All calls are handled via this agent. All actions are accepted and validated by this agent and it is the only agent that connects to the etcd database. It acts as a master process for the cluster. Each API call goes through authentication, authorisation and several admission controllers.
Kube-scheduler uses an algorithm to determine which node will host a Pod of containers. Views available resources such volumes and then deploys based on availability and success.
Etcd database stores the state of the cluster, networking and other persistent info. Values are always appended to the end. Previous copies of a data are marked for future removal by compaction process. There is a master database along with several followers. Starting from v1.15.1 kubeadm allows easy deployment of multi master clusters with stacked etcd clusters.
Kube-controller-manager is a core control loop daemon which interacts with the kube-apiserver to determine the state of the cluster. If it does not match, the manager will contact the necessary controllers to match the desired state. There are several controllers to use such as endpoints, namespaces and replication.
all worker nodes run a kubelet and kube-proxy as well as a container engine such as Docker
kubelet interacts with underlying Docker engine to ensure containers that need to run are actually running - it works to configure the local node until the specifications (in PODSPEC) has been met. Also sends back status to kube-apiserver for persistence
kube-proxy manages network connectivity to the containers using IPTABLES entries. Also has userspace mode, in which it monitors Services and Endpoints using a random high number port to proxy traffic
smallest unit of work - we do not (or rarely) directly interact with containers
represents a group of co-located containers with some associated data volumes
containers in a pod share the same network namespace
sidecar containers perform helper tasks like logging
Each pod is made up of:
- one or more containers
- one or more volumes
- a pause container
The pause container is used to get an IP address and then ensure all containers in a pod use its network namespace - containers can die and come back and they still use the same network namespace.
A Service is an abstraction which defines a logical set of Pods and a policy by which to access them (sometimes this pattern is called a micro-service)
In Kubernetes, controllers are control loops that watch the state of a cluster, then make or request changes where needed. Each controller tries to move the current cluster state closer to the desired state. Endpoints, namespace and serviceaccounts are examples of controllers.
Every pod gets its own IP address (Pods can be treated much like VMs or physical hosts from the perspectives of port allocation, naming, service discovery, load balancing, application configuration, and migration.
All containers within a pod behave as if they are on the same host with regard to networking. They can all reach each other's ports on localhost.
Pods are assigned IP address prior to container being started. The service object is used to connect pods within the network using CLUSTERIP addresses. From outside the cluster it uses NODEPORT addresses and using a load balancer if configured with a LOADBALANCER service. See here for more details
- ClusterIP: Exposes the Service on a cluster-internal IP. Choosing this value makes the Service only reachable from within the cluster. This is the default ServiceType.
- NodePort: Exposes the Service on each Node’s IP at a static port (the NodePort). A ClusterIP Service, to which the NodePort Service routes, is automatically created. You’ll be able to contact the NodePort Service, from outside the cluster, by requesting :.
- LoadBalancer: Exposes the Service externally using a cloud provider’s load balancer. NodePort and ClusterIP Services, to which the external load balancer routes, are automatically created.
Ingress exposes HTTP and HTTPS routes from outside the cluster to services within the cluster. Traffic routing is controlled by rules defined on the Ingress resource. - An Ingress can be configured to give Services externally-reachable URLs, load balance traffic, terminate SSL / TLS, and offer name based virtual hosting. An Ingress controller is responsible for fulfilling the Ingress, usually with a load balancer, though it may also configure your edge router or additional frontends to help handle the traffic.
internet | [ Ingress ] --|-----|-- [ Services ]
kubeadm - kubernetes cluster bootstrapping tool - uses CNI (container network interface) as the default network interface mechanism.
CNI is an emerging specification that configure container networking and remove allocated resources when the container is deleted.
kubectl create -f <service.yaml>
Services and Pods are "linked" as follows:
pods use labels (key:value pairs) as metadata i.e. labels -> type -> webserver
the service uses a selector (with same key:value pairs) to match those pods i.e. spec -> selector -> webserver
Using a single container per pod allows for the most granularity and decoupling. There are still some reasons to deploy multiple containers, sometimes called composite containers, in a single pod. The secondary containers can handle logging or enhance the primary, the sidecar concept, or acting as a proxy to the outside, the ambassador concept, or modifying data to meet an external format such as an adapter. All three concepts are secondary containers to perform a function the primary container does not.
kubectl create deployment <deployment_name> --image=<image_name> kubectl create deployment example --image=busybox
Creating a pod does not take advantage of orchestrating abilities of Kubernetes. Deployments on the other hand give us scalability, reliability and updates.
A container runtime is the component which runs the containerised application upon request. Docker Engine remains the default for Kubernetes, though cri-o or rkt, and others are gaining community support.
The Open Container Initiative (OCI) was created to help with portability across systems and environments. Docker donated their lib container project to form a new codebase called runC to support these goals.
Here is how to create a a Dockerfile
- Start Dockerfile with FROM i.e.
- Then build it with
docker build -t myapp
- Check that image built successfully with
- Then run the container with
docker run myapp
- Then push to the registry with
We can also build a local repo to push images to - this adds privacy and reduces bandwidth
Once it is configured, we can docker build, tag and push
To create a deployment with image version
kubectl create deployment <name> —image=<repo>/<app_name>:<version>
kubectl create deployment test-img —image=18.104.22.168:5001/myapp:v2.2
LIVENESS PROBE - when to restart a container (deadlocked etc) (if under a controller, it is restarted, else terminated)
READINESS PROBE - when it is available to accept traffic
decoupled resources - instead of hard coding a resource in an application, an intermediary, such as a Service, enables connection and reconnection of other resources, providing flexibility.
transience - each object should be developed with the expectation that other components will die and be rebuilt. (This allows us to terminate and deploy new versions easily)
flexible framework - many independent resources work together, but decoupled from each other, and without expectations of individual or permanent relationship. This allows for greater flexibility, higher availability and easy scalability.
Pods usually use as much CPU and memory as the workload requires. By using resource requests the scheduler will only schedule a pod to a node if the resources exist to meet all requests on that node.
- (spec.containers.resources.[ limits | requests ] .cpu
- If pod uses more CPIU than allowed it WON’T be evicted
- (spec.containers.resources.[ limits | requests ] .memory)
- If pod uses more memory than required, it may be restarted or evicted.
Ephemeral Storage - when a container crashes, kubelet will restart it and all the files will be lost. (Container starts with a clean state). When running containers in a pod, it is often necessary to share files between containers (this is where VOLUMES come in).
- If it uses more storage then specified it will be EVICTED.
See https://kubernetes.io/blog/2015/06/the-distributed-system-toolkit-patterns/ for mode details on this topic.
Sidecar Container - Adds some functionality not present in the main container such as logging - this allows for a decoupled and scalable approach rather than bloating main container code. Prometheus and Fluentd logging use sidecar containers to collect data
Adapter Container - Used to modify (adapt) the data either on ingress or egress to match some other need. An adapter would be an efficient way to standardise the output of the main container to be ingested by the monitoring tool without having to modify the monitor of the containerised application. An adapter container transforms multiple applications to a singular view
Ambassador - an ambassador allows for access to the outside world without having to implement a service or another entry in an ingress controller:
- Proxy local connection
- Reverse proxy
- Limit HTTP requests
- Re route from main container to the outside world. (Open Source Kubernetes-Native API Gateway built on the Envoy Proxy)
Jobs are part of the batch API group. They are used to run a set number of pods to completion. If a pod fails, it will be restarted until the number of completion is reached.
A job spec has a PARALLELISM (# of pods to run) and a COMPLETION ( # of pods to run successfully for the job to be considered done).
a volume is a directory (possibly pre-populated) made available to containers in a Pod
a Kubernetes volume shares at least the Pod lifetime, not the container within
if you want your storage lifetime to be distinct from a Pod, use Persistent Volumes. These allow for volumes to be claimed by a Pod using PersistentVolumeClaims.
a volume can persist longer than a pod, and can be accessed by multiple pods using PersistentVolumeClaims. This allows for state persistency
a volume can be made available to multiple pods, with each given an access mode to write
we can pass encoded data to a pod using Secrets and non-encoded data using a ConfigMap - we can pass things like SSH keys, passwords etc...
there is no concurrency checking, which means data corruption is probable unless outside locking takes place
examples of volume types:
There are three access modes for volumes:
- RWO(ReadWriteOnce) - allows read-write by a single node
- ROX(ReadOnlyMany) - allow read-only by multiple nodes
- RWX(ReadWriteMany) - allows read-write by multiple nodes
When a volume is requested, the local kubelet uses the kubelet_pods.go script to:
- map raw devices
- determine and make mount point for container
- create symbolic link on the host filesystem to associate the storage with the container
If no particular StorageClass was requested, only access_mode and size are used as params to create a volume.
There are many different types of volumes such as GCEpersistentDisk or awsElasticBlockStore (GCE and EBS respectively)
An emptyDir volume is first created when a Pod is assigned to a Node, and exists as long as that Pod is running on that node. As the name says, it is initially empty. Containers in the Pod can all read and write the same files in the emptyDir volume, though that volume can be mounted at the same or different paths in each Container. When a Pod is removed from a node for any reason, the data in the emptyDir is deleted forever.
A hostPath volume mounts a file or directory from the host node’s filesystem into your Pod. This is not something that most Pods will need, but it offers a powerful escape hatch for some applications.