Yoshio Terada for Microsoft Azure

Posted on Jan 21, 2020

Use DaemonSet as a cache when referencing large files from each pod

#kubernetes #daemonset

Assuming that the problem that came up when I met a certain customer this time and conducted a hackfest and the solution that I proposed to the customer might be effective if there were similar requirements elsewhere, Share that way here.

Following Use Case will be fit for this approach

It is necessary to read the same huge file (GB class file) from each pod
File is read only, no write (If writing is required, another method is available.)
I want to read a huge file as soon as possible

Note: Please consider the appropriate way by yourself depends on your requirements.

If you encounter the above scenario ,please consider to use the DaemonSet as a cache? Use the DaemonSet to launch one pod on each Node, and each pod in the DaemonSet have a copy of the file. Pods who want to refer to and acquire files is possible for not only reduce the communication across networks by acquiring and referencing copy files from the pod of the DaemonSet, but also can get files in a short time because the file is transfered by local network. (explaining the obvious things and evaluated it).

General Kubernetes volume mounting method

When dealing with persistent volumes in each pods, there is some way to mount the volume on each pods. And many cloud providers provide plugins to handle Persistence Volume on each cloud storage, and there are methods to mount disks in storage from each pod (3 way: ReadWriteOnce, ReadOnlyMany) , ReadWriteMany).

Please refer to the Kubernetes documentation

If you want to mount a disk one-to-one from each pod in Azure (ReadWriteOnce),you can use the Azure Disk. If you want to share the same Disk from multiple pods (ReadOnlyMany, ReadWriteMany), you can use the Azure Files. In this use case, we want to refer to large files from multiple pods, so I selected Azure Files (ReadOnlyMany).

Specifically, you can create Azure Files by following procedure and use Azure Files as a Kubernetes Persistence Volume.

$ export AKS_PERS_STORAGE_ACCOUNT_NAME=myfilestorageaccount
$ export AKS_PERS_RESOURCE_GROUP=Yoshio-Storage
$ export AKS_PERS_LOCATION=japaneast
$ export AKS_PERS_SHARE_NAME=aksshare

$ az storage account create -n $AKS_PERS_STORAGE_ACCOUNT_NAME -g $AKS_PERS_RESOURCE_GROUP -l $AKS_PERS_LOCATION --sku Standard_LRS

$ export AZURE_STORAGE_CONNECTION_STRING=`az storage account show-connection-string -n $AKS_PERS_STORAGE_ACCOUNT_NAME -g $AKS_PERS_RESOURCE_GROUP -o tsv`

$ az storage share create -n $AKS_PERS_SHARE_NAME

$ STORAGE_KEY=$(az storage account keys list --resource-group $AKS_PERS_RESOURCE_GROUP --account-name $AKS_PERS_STORAGE_ACCOUNT_NAME --query "[0].value" -o tsv)

$ kubectl create secret generic azure-secret --from-literal=azurestorageaccountname=$AKS_PERS_STORAGE_ACCOUNT_NAME --from-literal=azurestorageaccountkey=$STORAGE_KEY

Next, please create the Deployment artifact for mounting the volumes in each pods?

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ubuntu
spec:
  replicas: 2
  selector:
    matchLabels:
      app: ubuntu
  template:
    metadata:
      labels:
        app: ubuntu
        version: v1
    spec:
      containers:
      - name: ubuntu
        image: ubuntu
        command:
          - sleep
          - infinity
        volumeMounts:
          - mountPath: "/mnt/azure"
            name: volume
        resources:
          limits:
            memory: 4000Mi
          requests:
            cpu: 1000m
            memory: 4000Mi
        env:
          - name: NODE_IP
            valueFrom:
              fieldRef:
                fieldPath: status.hostIP
      volumes:
        - name: volume
          azureFile:
            secretName: azure-secret
            shareName: aksshare
            readOnly: true

In this time, I configured the memory as 4GB because it copy large files.

Please save the above file as deployment.yaml and execute the following command?

$ kubectl apply -f deployment.yaml

After the pod launches successfully, you can execute the following command to confirm the mount status. And you can see the Azure Files is mounted at /mnt/azure on individual pods.

$ kubectl exec -it ubuntu-884df4bfc-7zgkz mount |grep azure
//**********.file.core.windows.net/aksshare on /mnt/azure 
type cifs (rw,relatime,vers=3.0,cache=strict,username=********,domain=,uid=0,noforceuid,gid=0,noforcegid,addr=40.***.***.76,file_mode=0777,dir_mode=0777,soft,persistenthandles,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)

According to the above result, the actual file exists on Files Share (***************.file.core.windows.net/aksshare) of Azure Storage.

In order to download or refer to the files from the /aksshare, it will use the the Samba Protocol.
You can confirm it by using following command.

# date
Wed Jan 15 16:29:03 UTC 2020
# cp /mnt/azure/Kuberenetes-Operation-Demo.mov ~/mount.mov
# date
Wed Jan 15 16:29:54 UTC 2020 (51 Seconds)

According to the above, you can see that it took about 51 seconds to copy about 3 Gb video file into the pod. If multiple pods start up and refer to the same file, the file copy will be repeated over the network. It is inefficiency but normal usage.

In this time, we will use DaemonSet to improve the number of network transfers and the speed of this file sharing.

How to create the DaemonSet as a cache

In this time, I used the Nginx Server for container which running on DaemonSet. I copy the file to under the context root (/ app) of Nginx from Azure Blog Storage so that you can retrieve the file by HTTP Protocol.
I configured the below so that the contents in /app can be obtained via http://podIP/.

FROM alpine:3.6

RUN apk update && \
    apk add --no-cache nginx

RUN apk add curl

ADD default.conf /etc/nginx/conf.d/default.conf

EXPOSE 80
RUN mkdir /app
RUN mkdir -p /run/nginx

WORKDIR /app
CMD nginx -g "daemon off;"

Please Save the above as a Dockerfile?

server {
  listen 80 default_server;
  listen [::]:80 default_server;

  root /app;

  location / {
  }
}

Please Save the above as default.conf?

After that, Please Build this image and push it to the Azure Container Registry?

$ docker build -t tyoshio2002/nginxdatamng:1.1 .
$ docker tag tyoshio2002/nginxdatamng:1.1 yoshio.azurecr.io/tyoshio2002/nginxdatamng:1.1
$ docker login -u yoshio yoshio.azurecr.io 
Password: 
$ docker push yoshio.azurecr.io/tyoshio2002/nginxdatamng:1.1

Next, Please create a secret to connect to Azure Container Registry from Kubernetes Cluster?

kubectl create secret docker-registry docker-reg-credential --docker-server=yoshio.azurecr.io --docker-username=yoshio  --docker-password="***********************" --docker-email=foo-bar@microsoft.com

Now that we have pushed the image to the Azure Container Registry, we will write a manifest for the Daemonset.

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: mydaemonset
  labels:
    app: mydaemonset
spec:
  selector:
    matchLabels:
      name: mydaemonset
  template:
    metadata:
      labels:
        name: mydaemonset
    spec:
      tolerations:
      - key: node-role.kubernetes.io/master
        effect: NoSchedule
      imagePullSecrets:
        - name: docker-reg-credential
      containers:
      - name: nginx
        image: yoshio.azurecr.io/tyoshio2002/nginxdatamng:1.1
        resources:
          limits:
            memory: 4000Mi
          requests:
            cpu: 1000m
            memory: 4000Mi
        lifecycle:
          postStart:
            exec:
              command:
                - sh
                - -c
                - "curl https://myfilestorageaccount.blob.core.windows.net/fileshare/Kuberenetes-Operation-Demo.mov -o /app/aaa.mov"
      terminationGracePeriodSeconds: 30

In this case, a single file (Kuberenetes-Operation-Demo.mov) is downloaded under /app in the postStart phase to simplify verification.

When you need to use the multiple files, you need to implement the mechanism separately to download multiple files in the container image.

It is also assumed that it is possible to customize Nginx or use other web servers and App servers to improve the efficiency such as enabling content caching.

Please Save the above manifest file as daemonset.yaml and execute the following command?

$ kubectl apply -f daemonset.yaml

After creating a DaemonSet, Please create a Service for the DaemonSet? By creating a Service, you can access port 30001 of each node and access Nginx with the Node IP address with port number.

apiVersion: v1
kind: Service
metadata:
  labels:
    app: daemonset-service
  name: daemonset-service
spec:
  ports:
  - port: 80
    name: http
    targetPort: 80
    nodePort: 30001
  selector:
    name: mydaemonset
  sessionAffinity: None
  type: NodePort

Please Save the above file as service.yaml and execute the following command?

$ kubectl apply -f service.yaml

After deploying Nginx DaemonSet and Ubuntu Deployment respectively, execute the following command to check which pod is running on which node.

$ kubectl get no -o wide
NAME                                STATUS   ROLES   AGE    VERSION                         INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
aks-agentpool-41616757-vmss000000   Ready    agent   176m   v1.15.7                         10.240.0.4            Ubuntu 16.04.6 LTS   4.15.0-1064-azure   docker://3.0.8
aks-agentpool-41616757-vmss000001   Ready    agent   176m   v1.15.7                         10.240.0.35           Ubuntu 16.04.6 LTS   4.15.0-1064-azure   docker://3.0.8
aks-agentpool-41616757-vmss000002   Ready    agent   176m   v1.15.7                         10.240.0.66           Ubuntu 16.04.6 LTS   4.15.0-1064-azure   docker://3.0.8
virtual-node-aci-linux              Ready    agent   175m   v1.14.3-vk-azure-aci-v1.1.0.1   10.240.0.48                                  
$ kubectl get po -o wide
NAME                     READY   STATUS    RESTARTS   AGE   IP            NODE                                NOMINATED NODE   READINESS GATES
mydaemonset-gqlrw        1/1     Running   0          72m   10.240.0.24   aks-agentpool-41616757-vmss000000              
mydaemonset-nnn5l        1/1     Running   0          72m   10.240.0.50   aks-agentpool-41616757-vmss000001              
mydaemonset-pzvzx        1/1     Running   0          72m   10.240.0.82   aks-agentpool-41616757-vmss000002              
ubuntu-884df4bfc-7zgkz   1/1     Running   0          64m   10.240.0.69   aks-agentpool-41616757-vmss000002              
ubuntu-884df4bfc-gd26h   1/1     Running   0          63m   10.240.0.33   aks-agentpool-41616757-vmss000000              
ubuntu-884df4bfc-vh7rg   1/1     Running   0          63m   10.240.0.49   aks-agentpool-41616757-vmss000001

According to the above, ubuntu-884df4bfc-vh7rg is running on the node aks-agentpool-41616757-vmss000001, and on the node aks-agentpool-41616757-vmss000001 the DaemonSet Pod of mydaemonset-nnn5l is running.
And you can see that the node IP address of aks-agentpool-41616757-vmss000001 is 10.240.0.35.

Inside of the Ubuntu pod, the IP address of the Node where the pod is running is got by the environment variable $ NODE_IP.

$ kubectl exec -it ubuntu-884df4bfc-vh7rg env|grep NODE_IP
NODE_IP=10.240.0.35

In Addition
Lastly, we will skip the detailed explanation of how to create each of Standard and Premium (high speed) Azure Blob Storage.

After you created both of them, please copy the same video file in each storage?

Premium Storage of Azure Blob is connected to AKS VNET and files are obtained via VNET.

Other Reference information

Verification Details

In this time, I evaluated the following four points.

Time required to copy files in the directory where Azure Files is mounted to the pod
Time required to retrieve files existing in Azure Blob (Standard) by pod
Time required to retrieve files existing in Azure Blob (Premium) by pod
Time required to get from files copied in DaemonSet

First Try (The video of actual verification contents)

Video: https://www.youtube.com/watch?v=fe4QSfe9n-A

Confirmed by executing the following command

kubectl exec -it ubuntu-884df4bfc-vh7rg

## File Put on Azure Files (File shares : Samba)
$ date
$ cp /mnt/azure/Kuberenetes-Operation-Demo.mov ~/mount.mov
$ date

## File Put on Azure Blob (Containers)
$ curl https://myfilestorageaccount.blob.core.windows.net/fileshare/Kuberenetes-Operation-Demo.mov -o ~/direct.mov
$ curl https://mypremiumstorageaccount.blob.core.windows.net/fileshare/Kuberenetes-Operation-Demo.mov -o ~/premium2.mov
$ curl http://$NODE_IP:30001/aaa.mov -o ~/fromNodeIP.mov

The result of Third Try:

	Copy from PV(Samba)	Get file from Standard Storage	Get file from Premium Storage	Get file from DaemonSet Cache
１	51 Sec	69 Sec	27 Sec	20 Sec
２	60 Sec	60 Sec	23 Sec	11 Sec
３	53 Sec	57 Sec	20 Sec	8 Sec

By the way, even when using Nginx of DaemonSet, it takes time for the first access. This assumes that it is taking a long time to get, probably because it is not in Nginx memory. In other environments, it is faster after the second access.

Result Summary of verification

From the results of the above three executions, it was found that it is the fastest to get the file into the DaemonSet and *** get the file from the DaemonSet's cache***.
By using Premium storage, you can transfer files faster than Standard storage, but Premium costs more than Standard. Also, in the case of acquisition from Storage, it is necessary to download files via the network each time. Therefore, considering that, it is most efficient to first get the file on the same node (VM) and get it from there. It is natural to be quick and quick.

Advantages of using DaemonSet

Reduce network traffic
Faster file acquisition speed
If the implementation of the container in the DaemonSet is made richer, it can be used for other purposes (for example, writing support).

Concerns

Currently, a specific single file is acquired, but when handling multiple files or when there is an update of the file, a mechanism to periodically check for updates inside the DaemonSet and download the updated part Need to implement
The file size in the DaemonSet pod may be bloated and needs to be cleaned up
Regarding file writing, it is difficult to respond if this verification method is used as it is, but it is assumed that it will be supported if you implement a function like the In Memory Grid product in the container inside the DaemonSet.

Finally:
In addition, I conducted some other internal verifications, but because the explanation contents increased and the viewpoint was likely to be blurred, we focused on only the points that were particularly effective. Also, in order to make it as simple as possible, I used Nginx for DaemonSet, but I think that further performance improvement is needed, you can tune up the Nginx configuration or changing another implementation.

I would be very grateful if you could give further tuning based on this information.