DEV Community

Cover image for Managed cluster health checks
Gianluca
Gianluca

Posted on • Updated on

Managed cluster health checks

Sveltos is a project to deploy Kubernetes add-ons in tens of managed clusters.

We have already covered few aspects on Sveltos in different posts:

  1. Kubernetes add-ons deployment;
  2. Upgrading add-ons as cluster runtime state changes;
  3. Multi-tenancy with Sveltos.

In this post, we will cover how to instruct Sveltos to continuously evaluate managed cluster health statuses and send notifications any time health changes.

Add-on deployment state

When a ClusterProfile instance is created, Sveltos starts watching for clusters matching ClusterProfile clusterSelector field. In each matching workload cluster, Sveltos deploys all referenced add-ons (helm charts and/or Kubernetes resource YAMLs).

So, as first use case, it comes natural to have Sveltos sends notifications when all add-ons are deployed.

The notifications fields is a list of all notifications to be sent when liveness check states change.

Sveltos ClusterHealthCheck CRD

where slack secret contains Slack channel and token

kubectl create secret generic slack --from-literal=SLACK_TOKEN=<your token> --from-literal=SLACK_CHANNEL_ID=<your channel id> --type=addons.projectsveltos.io/cluster-profile 
Enter fullscreen mode Exit fullscreen mode

where webex secret contains Webex room and token

kubectl create secret generic slack --from-literal=WEBEX_TOKEN=<your token> --from-literal=WEBEX_ROOM_ID=<your channel id> --type=addons.projectsveltos.io/cluster-profile 

Enter fullscreen mode Exit fullscreen mode

Posting this ClusterHealthCheck instance, when add-ons are deployed in any cluster matching the clusterSelector, Sveltos:

  1. generates a Kubernetes event;
  2. send a slack message;
  3. send a webex message.

Custom Health Checks

Is that enough though? The answer clearly is no. Once all add-ons are deployed, we want to have a generic mechanism to assess managed cluster healths based on the state of Kubernetes resources deployed in those clusters.

Sveltos has a CRD, HealthCheck to define custom health checks. Sveltos supports custom health checks written in Lua.

HealthCheck Spec section contains following fields:

  1. Spec.Group/Spec.Version/Spec.Kind fields indicating which Kubernetes resources the HealthCheck is for. Sveltos will watch and evaluate those resources anytime a change happens;
  2. Spec.Namespace field can be used to filter resources by namespace;
  3. Spec.LabelFilters field can be used to filter resources by labels;
  4. Spec.Script can contain a Lua script, which define a custom health check.

When providing Sveltos with a Lua script, Sveltos expects following format:

  1. must contain a function function evaluate(). This is the function that is directly invoked and passed a Kubernetes resource (inside the function obj represents the passed in Kubernetes resource);
  2. must return a Lua table with following fields: status: which can be set to either one of Healthy/Progressing/Degraded/Suspended;
  3. ignore: this is a bool indicating whether Sveltos should ignore this resource. If hs.ignore is set to true, Sveltos will ignore the resource causing that result;
  4. message: this is a string that can be set and Sveltos will print if set.

Sveltos HealthCheck instance

In above example, we are creating an HealthCheck that watches for all ConfigMaps. hs is the health status object we will return to Sveltos. It must contain a status attribute which indicates whether the resource is Healthy, Progressing, Degraded or Suspended.

By default we set it to Healthy and hs.ignore=true, since we don’t want to mess with the status of other, non-OPA ConfigMaps. Optionally, the health status object may also contain a message.

Identify if the ConfigMap is indeed an OPA policy or another kind of ConfigMap. If it is a OPA policy, retrieve the value of the openpolicyagent.org/policy-status annotation. The annotation is set to {"status":"ok"} if the policy was loaded successfully, if errors occurred during loading (e.g., because the policy contained a syntax error) the cause will be reported in the annotation. Depending on the value of the annotation, we set the status and message attributes appropriately.

ClusterHealthCheck instance can reference one or more HealthChecks. A sveltos agent running in each managed cluster keeps evaluating each HealthCheck instance and reports status back to the management cluster any time a change happens.

Sveltos counter-part running in the management cluster, evaluates those reports and sends notification(s) when cluster health state changes.

ClusterHealthCheck instance referencing HealthChecks

Top comments (0)