Leandro Proença

Posted on Apr 5, 2023

Kubernetes 101, part VI, daemonsets

#kubernetes #docker

For most use cases, deploying core business apps in Kubernetes using Deployments for stateless applications and StatefulSets for stateful applications is good enough.

Not rare, we need to deploy components that will not perform the core business work but will support the core business instead.

Core business apps need observability: application metrics, latency, CPU-load, etc. Furthermore, core business apps need to tell how things are going on, in other words they need a logging architecture.

When default logging is not enough

Once we deploy the main core business workload in Kubernetes, wen can check the logs by going through each Pod manually. It can be cumbersome.

Kubernetes provide kubectl logs which helps a lot and, by adding a bit of bash script and creativity, we can rapidly check logs of all Pods in the cluster.

But we have to provide a better developer experience (DX) to our team, so only providing kubectl logs might be not enough for some cases.

A potential logging solution

How about collecting and concentrating all logs in a single place?

What if we had a single Pod in every Node responsible for collecting logs and sending them to a common place where developers could easily fetch the logs of the cluster?

In this scenario, every Node would run a single Pod for collecting logs. Any time a new Node is created, some kind of "daemon controller" would make sure that a new Pod is scheduled to the new node. Thus, all Nodes would collect logs.

The picture below illustrates this potential solution:

DaemonSets for the rescue.

DaemonSet

The Kubernetes DaemonSet object brings a DaemonSet controller that watches for Nodes creation/deletion and works to make sure every Node will have a single Pod replica of the DaemonSet.

Log collectors are a perfect fit for this solution.

Let's create a very dead simple log collector just using DaemonSet, Linux and creativity, nothing more.

The YAML file looks like the following:



apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: log-collector
spec:
  selector:
    matchLabels:
      app: log-collector
  template:
    metadata:
      labels:
        app: log-collector
    spec:
      containers:
      - name: log-collector
        image: busybox
        command: ["/bin/sh", "-c", "while true; do find /var/log/pods -name '*.log' -print0 | xargs -0 cat >> /logs/all-pods.log; sleep 5; done"]
        volumeMounts:
        - name: all-logs
          mountPath: /logs
        - name: var-log
          mountPath: /var/log/pods
        - name: var-containers
          mountPath: /var/lib/docker/containers
      volumes:
      - name: all-logs
        hostPath:
          path: /logs
      - name: var-log
        hostPath:
          path: /var/log/pods
      - name: var-containers
        hostPath:
          path: /var/lib/docker/containers

Some highlights:

there's no multiple replicas like in Deployments, only a single Pod running on every Node
In Kubernetes with Docker, by default, all logs are sent to /var/log/pods via /var/lib/docker/containers. This is located in every Node
We mount volumes for those /var/* locations so we can watch for changes in these folders and send them to a common single location
In this DaemonSet, we configure to send all logs to /logs/app-pods.log, then mounting back the volume in the host

After deploying, in the host, check the logs:



$ tail -f /logs/app-pods.log

{"log":"2023/04/05 02:29:34 [notice] 1#1: using the \"epoll\" event method\n","stream":"stderr","time":"2023-04-05T02:29:34.687797577Z"}
{"log":"2023/04/05 02:29:34 [notice] 1#1: nginx/1.23.4\n","stream":"stderr","time":"2023-04-05T02:29:34.687806202Z"}
{"log":"2023/04/05 02:29:34 [notice] 1#1: built by gcc 10.2.1 20210110 (Debian 10.2.1-6) \n","stream":"stderr","time":"2023-04-05T02:29:34.687807994Z"}
{"log":"2023/04/05 02:29:34 [notice] 1#1: OS: Linux 5.15.68-0-virt\n","stream":"stderr","time":"2023-04-05T02:29:34.687809452Z"}
{"log":"2023/04/05 02:29:34 [notice] 1#1: getrlimit(RLIMIT_NOFILE): 1048576:1048576\n","stream":"stderr","time":"2023-04-05T02:29:34.687810744Z"}
{"log":"2023/04/05 02:29:34 [notice] 1#1: start worker processes\n","stream":"stderr","time":"2023-04-05T02:29:34.687811994Z"}
{"log":"2023/04/05 02:29:34 [notice] 1#1: start worker process 29\n","stream":"stderr","time":"2023-04-05T02:29:34.687842494Z"}
{"log":"2023/04/05 02:29:34 [notice] 1#1: start worker process 30\n","stream":"stderr","time":"2023-04-05T02:29:34.68784791Z"}
{"log":"2023/04/05 02:29:34 [notice] 1#1: start worker process 31\n","stream":"stderr","time":"2023-04-05T02:29:34.687900494Z"}
{"log":"2023/04/05 02:29:34 [notice] 1#1: start worker process 32\n","stream":"stderr","time":"2023-04-05T02:29:34.687971452Z"}

Yay! How cool is that?

Professionalism is all

Of course, in production, this dead simple log collector won't scale accordingly.

Instead, we can use tooling like fluentd, logstash and similar to do a more robust and scalable work.

Wrapping Up

Today we learned the importance of structuring and collecting logs of our applications, no matter where they are deployed.

In Kubernetes, life's a bit easier because it's a cluster of containers and as such, we employ a special controller called DaemonSet that will make sure we have a log collector Pod running in every Node.

Don't miss the next posts where we'll talk about Jobs and CronJobs.

Cheers!

Top comments (1)

Fabio Soares • Apr 9 '23

Just loving this series. I can barely wait for the next post.