DEV Community

Cover image for What is Kured (KUbernetes REboot Daemon) in k8s?

What is Kured (KUbernetes REboot Daemon) in k8s?

Defination from the official page states that kured is a Kubernetes daemonset that performs safe automatic node reboots when the need to do so is indicated by the package management system of the underlying OS.

By periodically rebooting nodes, Kured ensures that any pending updates or configuration changes take effect, resulting in a more efficient and reliable cluster.

Here are the key points to note:

  • Kured monitors the operating system for security patches, kernel updates, and system-level changes in Kubernetes nodes. It proactively identifies the need for reboots to keep the cluster secure and up-to-date.

  • When a reboot is required, Kured gracefully cordons the node, marking it as unschedulable for new pods without disrupting existing ones. It then proceeds to drain the node, evicting existing pods in a controlled manner to ensure a smooth reboot process.

  • Kured includes built-in safety mechanisms to prevent unnecessary reboots and allows users to define maintenance windows for avoiding disruptions.

The continuous monitoring by Kured ensures that the Kubernetes cluster operates with the latest updates, enhancing performance, security, and stability.
Organizations can leverage Kubernetes clusters more effectively while minimizing risks associated with outdated software and configurations.

Setting up Kured is a straightforward process that involves deploying it as a DaemonSet in the Kubernetes cluster. This deployment strategy ensures that Kured runs on every node within the cluster, effectively monitoring and managing the rebooting process for each individual node.

Here is how you can do that

# ClusterRole for kured
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: kured
rules:
- apiGroups: [""]
  resources: ["nodes"]
  verbs: ["get", "patch"]
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["list", "delete", "get"]
- apiGroups: ["apps"]
  resources: ["daemonsets"]
  verbs: ["get"]
- apiGroups: [""]
  resources: ["pods/eviction"]
  verbs: ["create"]

# ClusterRoleBinding for kured
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: kured
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: kured
subjects:
- kind: ServiceAccount
  name: kured
  namespace: kube-system

# Role for kured in kube-system namespace
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: kube-system
  name: kured
rules:
- apiGroups: ["apps"]
  resources: ["daemonsets"]
  resourceNames: ["kured"]
  verbs: ["update"]

# RoleBinding for kured in kube-system namespace
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  namespace: kube-system
  name: kured
subjects:
- kind: ServiceAccount
  namespace: kube-system
  name: kured
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: kured

# ServiceAccount for kured
apiVersion: v1
kind: ServiceAccount
metadata:
  name: kured
  namespace: kube-system

# DaemonSet for kured
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: kured
  namespace: kube-system
spec:
  selector:
    matchLabels:
      name: kured
  updateStrategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        name: kured
    spec:
      serviceAccountName: kured
      tolerations:
      - key: node-role.kubernetes.io/master
        effect: NoSchedule
      - key: node-role.kubernetes.io/control-plane
        effect: NoSchedule
      - key: "node-role.kubernetes.io/mysql"
        operator: "Equal"
        effect: "NoSchedule"
      hostPID: true
      restartPolicy: Always
      containers:
      - name: kured
        image: ghcr.io/kubereboot/kured:{{ kured_version }}
        imagePullPolicy: IfNotPresent
        securityContext:
          privileged: true
        env:
        - name: KURED_NODE_ID
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        command:
        - /usr/bin/kured
        - --reboot-days=mon,tue,wed,thu
        - --reboot-delay=90s
        - --start-time=3am
        - --end-time=5am
        - --time-zone=UTC
        - --prometheus-url={{ prometheus_url }}
        - --alert-filter-regexp=^Watchdog$
        - --period=15m

Enter fullscreen mode Exit fullscreen mode

Explanation:
This part allows you to define maintenance windows for avoiding disruptions.

  command:
        - /usr/bin/kured
        - --reboot-days=mon,tue,wed,thu
        - --reboot-delay=90s
        - --start-time=3am
        - --end-time=5am
        - --time-zone=UTC
        - --prometheus-url={{ prometheus_url }}
        - --alert-filter-regexp=^Watchdog$
        - --period=15m
Enter fullscreen mode Exit fullscreen mode

Top comments (1)

Collapse
 
bcouetil profile image
Benoit COUETIL 💫

Welcome here, and thank you for sharing !

This means that nodes are constantly updating ? Is this a default behavior ? And what about public Cloud providers ?

Side note : when editing an article you can format code plus give a language (for ex yaml), you would have syntax color 😊