Assuming that the problem that came up when I met a certain customer this time and conducted a hackfest and the solution that I proposed to the customer might be effective if there were similar requirements elsewhere, Share that way here.
Following Use Case will be fit for this approach
- It is necessary to read the same huge file (GB class file) from each pod
- File is read only, no write (If writing is required, another method is available.)
- I want to read a huge file as soon as possible
Note: Please consider the appropriate way by yourself depends on your requirements.
If you encounter the above scenario ,please consider to use the DaemonSet as a cache? Use the DaemonSet to launch one pod on each Node, and each pod in the DaemonSet have a copy of the file. Pods who want to refer to and acquire files is possible for not only reduce the communication across networks by acquiring and referencing copy files from the pod of the DaemonSet, but also can get files in a short time because the file is transfered by local network. (explaining the obvious things and evaluated it).
General Kubernetes volume mounting method
When dealing with persistent volumes in each pods, there is some way to mount the volume on each pods. And many cloud providers provide plugins to handle Persistence Volume on each cloud storage, and there are methods to mount disks in storage from each pod (3 way: ReadWriteOnce, ReadOnlyMany) , ReadWriteMany).
Please refer to the Kubernetes documentation
If you want to mount a disk one-to-one from each pod in Azure (ReadWriteOnce),you can use the Azure Disk. If you want to share the same Disk from multiple pods (ReadOnlyMany, ReadWriteMany), you can use the Azure Files. In this use case, we want to refer to large files from multiple pods, so I selected Azure Files (ReadOnlyMany).
Specifically, you can create Azure Files by following procedure and use Azure Files as a Kubernetes Persistence Volume.
$ export AKS_PERS_STORAGE_ACCOUNT_NAME=myfilestorageaccount
$ export AKS_PERS_RESOURCE_GROUP=Yoshio-Storage
$ export AKS_PERS_LOCATION=japaneast
$ export AKS_PERS_SHARE_NAME=aksshare
$ az storage account create -n $AKS_PERS_STORAGE_ACCOUNT_NAME -g $AKS_PERS_RESOURCE_GROUP -l $AKS_PERS_LOCATION --sku Standard_LRS
$ export AZURE_STORAGE_CONNECTION_STRING=`az storage account show-connection-string -n $AKS_PERS_STORAGE_ACCOUNT_NAME -g $AKS_PERS_RESOURCE_GROUP -o tsv`
$ az storage share create -n $AKS_PERS_SHARE_NAME
$ STORAGE_KEY=$(az storage account keys list --resource-group $AKS_PERS_RESOURCE_GROUP --account-name $AKS_PERS_STORAGE_ACCOUNT_NAME --query "[0].value" -o tsv)
$ kubectl create secret generic azure-secret --from-literal=azurestorageaccountname=$AKS_PERS_STORAGE_ACCOUNT_NAME --from-literal=azurestorageaccountkey=$STORAGE_KEY
Next, please create the Deployment artifact for mounting the volumes in each pods?
apiVersion: apps/v1
kind: Deployment
metadata:
name: ubuntu
spec:
replicas: 2
selector:
matchLabels:
app: ubuntu
template:
metadata:
labels:
app: ubuntu
version: v1
spec:
containers:
- name: ubuntu
image: ubuntu
command:
- sleep
- infinity
volumeMounts:
- mountPath: "/mnt/azure"
name: volume
resources:
limits:
memory: 4000Mi
requests:
cpu: 1000m
memory: 4000Mi
env:
- name: NODE_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP
volumes:
- name: volume
azureFile:
secretName: azure-secret
shareName: aksshare
readOnly: true
- In this time, I configured the memory as 4GB because it copy large files.
Please save the above file as deployment.yaml and execute the following command?
$ kubectl apply -f deployment.yaml
After the pod launches successfully, you can execute the following command to confirm the mount status. And you can see the Azure Files is mounted at /mnt/azure on individual pods.
$ kubectl exec -it ubuntu-884df4bfc-7zgkz mount |grep azure
//**********.file.core.windows.net/aksshare on /mnt/azure
type cifs (rw,relatime,vers=3.0,cache=strict,username=********,domain=,uid=0,noforceuid,gid=0,noforcegid,addr=40.***.***.76,file_mode=0777,dir_mode=0777,soft,persistenthandles,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
According to the above result, the actual file exists on Files Share (***************.file.core.windows.net/aksshare) of Azure Storage.
In order to download or refer to the files from the /aksshare, it will use the the Samba Protocol.
You can confirm it by using following command.
# date
Wed Jan 15 16:29:03 UTC 2020
# cp /mnt/azure/Kuberenetes-Operation-Demo.mov ~/mount.mov
# date
Wed Jan 15 16:29:54 UTC 2020 (51 Seconds)
According to the above, you can see that it took about 51 seconds to copy about 3 Gb video file into the pod. If multiple pods start up and refer to the same file, the file copy will be repeated over the network. It is inefficiency but normal usage.
In this time, we will use DaemonSet to improve the number of network transfers and the speed of this file sharing.
How to create the DaemonSet as a cache
In this time, I used the Nginx Server for container which running on DaemonSet. I copy the file to under the context root (/ app) of Nginx from Azure Blog Storage so that you can retrieve the file by HTTP Protocol.
I configured the below so that the contents in /app can be obtained via http://podIP/.
FROM alpine:3.6
RUN apk update && \
apk add --no-cache nginx
RUN apk add curl
ADD default.conf /etc/nginx/conf.d/default.conf
EXPOSE 80
RUN mkdir /app
RUN mkdir -p /run/nginx
WORKDIR /app
CMD nginx -g "daemon off;"
Please Save the above as a Dockerfile?
server {
listen 80 default_server;
listen [::]:80 default_server;
root /app;
location / {
}
}
Please Save the above as default.conf?
After that, Please Build this image and push it to the Azure Container Registry?
$ docker build -t tyoshio2002/nginxdatamng:1.1 .
$ docker tag tyoshio2002/nginxdatamng:1.1 yoshio.azurecr.io/tyoshio2002/nginxdatamng:1.1
$ docker login -u yoshio yoshio.azurecr.io
Password:
$ docker push yoshio.azurecr.io/tyoshio2002/nginxdatamng:1.1
Next, Please create a secret to connect to Azure Container Registry from Kubernetes Cluster?
kubectl create secret docker-registry docker-reg-credential --docker-server=yoshio.azurecr.io --docker-username=yoshio --docker-password="***********************" --docker-email=foo-bar@microsoft.com
Now that we have pushed the image to the Azure Container Registry, we will write a manifest for the Daemonset.
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: mydaemonset
labels:
app: mydaemonset
spec:
selector:
matchLabels:
name: mydaemonset
template:
metadata:
labels:
name: mydaemonset
spec:
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
imagePullSecrets:
- name: docker-reg-credential
containers:
- name: nginx
image: yoshio.azurecr.io/tyoshio2002/nginxdatamng:1.1
resources:
limits:
memory: 4000Mi
requests:
cpu: 1000m
memory: 4000Mi
lifecycle:
postStart:
exec:
command:
- sh
- -c
- "curl https://myfilestorageaccount.blob.core.windows.net/fileshare/Kuberenetes-Operation-Demo.mov -o /app/aaa.mov"
terminationGracePeriodSeconds: 30
- In this case, a single file (Kuberenetes-Operation-Demo.mov) is downloaded under /app in the postStart phase to simplify verification.
When you need to use the multiple files, you need to implement the mechanism separately to download multiple files in the container image.
It is also assumed that it is possible to customize Nginx or use other web servers and App servers to improve the efficiency such as enabling content caching.
Please Save the above manifest file as daemonset.yaml and execute the following command?
$ kubectl apply -f daemonset.yaml
After creating a DaemonSet, Please create a Service for the DaemonSet? By creating a Service, you can access port 30001 of each node and access Nginx with the Node IP address with port number.
apiVersion: v1
kind: Service
metadata:
labels:
app: daemonset-service
name: daemonset-service
spec:
ports:
- port: 80
name: http
targetPort: 80
nodePort: 30001
selector:
name: mydaemonset
sessionAffinity: None
type: NodePort
Please Save the above file as service.yaml and execute the following command?
$ kubectl apply -f service.yaml
After deploying Nginx DaemonSet and Ubuntu Deployment respectively, execute the following command to check which pod is running on which node.
$ kubectl get no -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
aks-agentpool-41616757-vmss000000 Ready agent 176m v1.15.7 10.240.0.4 Ubuntu 16.04.6 LTS 4.15.0-1064-azure docker://3.0.8
aks-agentpool-41616757-vmss000001 Ready agent 176m v1.15.7 10.240.0.35 Ubuntu 16.04.6 LTS 4.15.0-1064-azure docker://3.0.8
aks-agentpool-41616757-vmss000002 Ready agent 176m v1.15.7 10.240.0.66 Ubuntu 16.04.6 LTS 4.15.0-1064-azure docker://3.0.8
virtual-node-aci-linux Ready agent 175m v1.14.3-vk-azure-aci-v1.1.0.1 10.240.0.48
$ kubectl get po -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
mydaemonset-gqlrw 1/1 Running 0 72m 10.240.0.24 aks-agentpool-41616757-vmss000000
mydaemonset-nnn5l 1/1 Running 0 72m 10.240.0.50 aks-agentpool-41616757-vmss000001
mydaemonset-pzvzx 1/1 Running 0 72m 10.240.0.82 aks-agentpool-41616757-vmss000002
ubuntu-884df4bfc-7zgkz 1/1 Running 0 64m 10.240.0.69 aks-agentpool-41616757-vmss000002
ubuntu-884df4bfc-gd26h 1/1 Running 0 63m 10.240.0.33 aks-agentpool-41616757-vmss000000
ubuntu-884df4bfc-vh7rg 1/1 Running 0 63m 10.240.0.49 aks-agentpool-41616757-vmss000001
According to the above, ubuntu-884df4bfc-vh7rg is running on the node aks-agentpool-41616757-vmss000001, and on the node aks-agentpool-41616757-vmss000001 the DaemonSet Pod of mydaemonset-nnn5l is running.
And you can see that the node IP address of aks-agentpool-41616757-vmss000001 is 10.240.0.35.
Inside of the Ubuntu pod, the IP address of the Node where the pod is running is got by the environment variable $ NODE_IP.
$ kubectl exec -it ubuntu-884df4bfc-vh7rg env|grep NODE_IP
NODE_IP=10.240.0.35
In Addition
Lastly, we will skip the detailed explanation of how to create each of Standard and Premium (high speed) Azure Blob Storage.
After you created both of them, please copy the same video file in each storage?
- Premium Storage of Azure Blob is connected to AKS VNET and files are obtained via VNET.
Other Reference information
- Azure premium storage: design for high performance
- Troubleshoot Azure Files problems in Linux
- Troubleshoot Azure Files performance issues
Verification Details
In this time, I evaluated the following four points.
- Time required to copy files in the directory where Azure Files is mounted to the pod
- Time required to retrieve files existing in Azure Blob (Standard) by pod
- Time required to retrieve files existing in Azure Blob (Premium) by pod
- Time required to get from files copied in DaemonSet
First Try (The video of actual verification contents)
Video: https://www.youtube.com/watch?v=fe4QSfe9n-A
Confirmed by executing the following command
kubectl exec -it ubuntu-884df4bfc-vh7rg
## File Put on Azure Files (File shares : Samba)
$ date
$ cp /mnt/azure/Kuberenetes-Operation-Demo.mov ~/mount.mov
$ date
## File Put on Azure Blob (Containers)
$ curl https://myfilestorageaccount.blob.core.windows.net/fileshare/Kuberenetes-Operation-Demo.mov -o ~/direct.mov
$ curl https://mypremiumstorageaccount.blob.core.windows.net/fileshare/Kuberenetes-Operation-Demo.mov -o ~/premium2.mov
$ curl http://$NODE_IP:30001/aaa.mov -o ~/fromNodeIP.mov
The result of Third Try:
Copy from PV(Samba) | Get file from Standard Storage | Get file from Premium Storage | Get file from DaemonSet Cache | |
1 | 51 Sec | 69 Sec | 27 Sec | 20 Sec |
2 | 60 Sec | 60 Sec | 23 Sec | 11 Sec |
3 | 53 Sec | 57 Sec | 20 Sec | 8 Sec |
- By the way, even when using Nginx of DaemonSet, it takes time for the first access. This assumes that it is taking a long time to get, probably because it is not in Nginx memory. In other environments, it is faster after the second access.
Result Summary of verification
From the results of the above three executions, it was found that it is the fastest to get the file into the DaemonSet and *** get the file from the DaemonSet's cache***.
By using Premium storage, you can transfer files faster than Standard storage, but Premium costs more than Standard. Also, in the case of acquisition from Storage, it is necessary to download files via the network each time. Therefore, considering that, it is most efficient to first get the file on the same node (VM) and get it from there. It is natural to be quick and quick.
Advantages of using DaemonSet
- Reduce network traffic
- Faster file acquisition speed
- If the implementation of the container in the DaemonSet is made richer, it can be used for other purposes (for example, writing support).
Concerns
- Currently, a specific single file is acquired, but when handling multiple files or when there is an update of the file, a mechanism to periodically check for updates inside the DaemonSet and download the updated part Need to implement
- The file size in the DaemonSet pod may be bloated and needs to be cleaned up
- Regarding file writing, it is difficult to respond if this verification method is used as it is, but it is assumed that it will be supported if you implement a function like the In Memory Grid product in the container inside the DaemonSet.
Finally:
In addition, I conducted some other internal verifications, but because the explanation contents increased and the viewpoint was likely to be blurred, we focused on only the points that were particularly effective. Also, in order to make it as simple as possible, I used Nginx for DaemonSet, but I think that further performance improvement is needed, you can tune up the Nginx configuration or changing another implementation.
I would be very grateful if you could give further tuning based on this information.
Top comments (0)