Data types in Kubernetes: PriorityQueue

#kubernetes #datatypes #go #queue

Kubernetes Version: v1.13.2

The Kubernetes scheduler (kube-scheduler) is the component of Kubernetes that is responsible for assigning pods to nodes. This is called scheduling and thus the name kube-scheduler. The kube-scheduler has a feature called pod priority and preemption.

Pod priority and preemption allows pods to be assigned a priority value in the pod spec. During pod scheduling, Kubernetes will take the priority into account. Pods with a higher priority will be scheduled ahead of lower priority pods. Additionally, lower priority pods can be evicted in favor of a higher priority pod in low resource situations.

First, let's look at how the kube-scheduler works. The kube-scheduler has access to a queue of pods that need to be scheduled. Whenever a pod is created or modified, the pod is added to the kube-scheduler's pod queue. The kube-scheduler waits for pods to exist on the pod queue, dequeues a pod and schedules it.

The kube-scheduler's pod queue can either return pods in the order in which the pods joined the queue or based on the pod's priority. In the case of first-in-first-out, if 5 pods are created and need to be scheduled, the kube-scheduler may schedule the first 4 and then run out of resources for the last pod. If that last pod was the most important pod to schedule then you are out of luck. You'll have to delete some pods until you have enough resources to allow the scheduler to schedule it.

If the kube-scheduler's pod queue is based on priority then you have the ability to assign priorities to pods. If the same 5 pods get scheduled but each have a different priority, the highest priority will be scheduled first, or, if it comes in after the other 4 have already been scheduled and there are no more resources, the kube-scheduler will try to move the lower priority pods around until there are enough resources on a node for the highest priority pod to be scheduled.

The kube-scheduler is able to swap out the implementation of the pod queue because it is abstracted behind an interface. If pod priority is disabled then the kube-scheduler will use a plain first-in-first-out queue as the data structure to satisfy the scheduling queue interface. However in the default case where pod priority is enabled, the kube-scheduler will use a priority queue to implement the scheduling queue.

A priority queue has the same interface as a regular queue:

type Queue interface {

    // Put an item on the queue
    Enqueue(interface{})

    // Remove an item from the queue
    Dequeue() interface{}
}

The difference is entirely in the underlying implementation.

In Go it's common to see a first-in-first-out queue implemented with a slice or a channel as the underlying data structure.

If the queue is a priority queue instead of a first-in-first-out queue then a heap can be used as the data structure to implement the queue creating a priority queue.

Go has a built in heap interface defined in the container package. This lets you implement some methods of the heap interface and get the common heap operations implemented efficiently by the Go library authors.

We can see the two queue implementations in the kube-scheduler's code:

To recap from an outside-in-perspective:

The kube-scheduler depends on a SchedulingQueue to get the next pod to schedule.
The SchedulingQueue can be implemented in a variety of ways but by default uses a priority queue (pod priority is enabled).
The priority queue uses a heap to keep a priority-based order of the pods it knows about. This means that when the kube-scheduler gets the next pod to schedule, the pod with the highest priority is returned first.

Please give me feedback if you feel the desire! I'm always looking to improve my technical writing skill.