How to handle long running tasks in Kubernetes reconciliation loop

#kubernetes #tutorial #showdev #controller

Controllers are the core of Kubernetes.

It’s a controller’s job to ensure that, for any given object, the actual state matches the desired state defined in the object spec. Each controller focuses on one Kind, but may interact with other Kinds.

This process is called reconciling.

Every Controller has a Reconciler object with a Reconcile() method that implements the reconcile loop. The reconcile loop is passed the Request argument which is a Namespace/Name key used to lookup the primary resource object.

For instance this is the Sveltos ClusterProfileReconciler

func (r *ClusterProfileReconciler) Reconcile(ctx context.Context, req ctrl.Request) (_ ctrl.Result, reterr error)

For more details, check the Reconcile godoc.

Ideally, no Reconcile should ever take more than a few milliseconds. So what if you need to handle long-running tasks from the reconcile loop?

Consider the following ClusterProfile instance:

Anytime sveltos detects there is a new cluster matching Spec.ClusterSelector, it deploys nginx and kyverno helm charts in the matching cluster.
Deploying above helm charts can be considered a long running task. So how does sveltos handle that?
Inspired by Kubernetes reconciling logic we implemented a similar concept.

We have defined following interface

Deploy method is invoked by any Sveltos Reconciler that needs a long running task to be executed;
GetResult is then used to fetch the result (essentially to know whether the task succeeded or any error occurred).

And a client with set of workers to run long running tasks in background.

Client internally uses:

Dirty set: contains all requests which have yet to be processed;
Jobs queue: contains all requests any worker can start executing;
InProgress set: contains all requests currently being processed by workers.

When any Sveltos reconciler needs to execute a long running job, sends a request for it (invokes Deploy).

The request is first added to the Dirty set or dropped if it already present. Dropping a request already present in the Dirty set is important. A request present in the Dirty set has not been executed yet. When request will be executed will use the state at the current time.

From there it is pushed to the Job queue only if it is not presented in inProgress.

When a worker is ready to serve a request, it gets the request from the front of the Job queue. The request is also added to the InProgress set and removed from the Dirty set.
Now if a new request arrives while still being present in the InProgress set, the request is only added to the dirty set, not to the Job queue. This guarantees that same request is never handled by multiple workers in parallel.

When worker is done, the request is removed from the inProgress set. If similar request is also present in the dirty set, it is added back to the back of the jobQueue (it can now be processed).

Sveltos is an open source project. Contributions are more welcome.

If you would like to know more about it or would like to see a new feature added to sveltos, please reach to us on slack. Any feedback is very much appreciated.

DEV Community

How to handle long running tasks in Kubernetes reconciliation loop

Top comments (0)

Read next

Detecting and Analyzing Comment Quality Using Vector Search

Unlocking Quickpix AI's Potential: Features, Pricing, and Performance Review

How to Generate Prometheus Metrics from Logs Collected by Fluentd

Docker | Docker Compose | Kubernetes