This post describes how to mount an S3 bucket to all the nodes in an EKS cluster and make it available to pods as a hostPath volume. Yes, we're aware of the security implications of hostPath volumes, but in this case it's less of an issue - because the actual access is granted to the S3 bucket (not the host filesystem) and access permissions are provided per serviceAccount.
Goofys
We're using goofys as the mounting utility. It's a "high-performance, POSIX-ish Amazon S3 file system written in Go" based on FUSE (file system in user space) technology.
Daemonset
In order to provide the mount transparently we need to run a daemonset - so the mount is created on all nodes in the cluster.
The Dockerfile and the Helm Chart
We've built our own goofys Docker image based on Alpine Linux and a Helm chart that installs the DaemonSet.
The image is found on our Docker hub repo here: https://hub.docker.com/r/otomato/goofys
The Dockerfile and the Helm chart can be found here: https://github.com/otomato-gh/s3-mounter
S3 Access per ServiceAccount
The Helm chart currently assumes that S3 access is provided by using an IAM Role attached to a kubernetes serviceAccount. We may add API access keys support in the future if needed.
HowTo:
Here's how to set it all up:
1. OIDC Provider for EKS
Make sure you have an IAM OIDC identity provider for your cluster. If not - you can use the following commands (you'll need eksctl
installed):
aws eks describe-cluster --name cluster_name --query "cluster.identity.oidc.issuer" --output text`
Example output:
https://oidc.eks.region-code.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E
List the IAM OIDC providers in your account. Replace EXAMPLED539D4633E53DE1B716D3041E
with the value returned from the previous command.
aws iam list-open-id-connect-providers | grep EXAMPLED539D4633E53DE1B716D3041E
Example output
"Arn": "arn:aws:iam::111122223333:oidc-provider/oidc.eks.region-code.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E"
If output is returned from the previous command, then you already have a provider for your cluster. If no output is returned, then you must create an IAM OIDC provider with the following command. Replace cluster_name
with your own value.
eksctl utils associate-iam-oidc-provider --cluster cluster_name --approve
2. Create a Managed Policy for Bucket Access
Create json file named policy.json
with the appropriate policy definition. For example - the following code snippet creates a json file that allows full access to a bucket named my-kubernetes-bucket
:
read -r -d '' MY_POLICY <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:*",
],
"Resource": [
"arn:aws:s3:::my-kubernetes-bucket"
]
}
]
}
EOF
echo "${MY_POLICY}" > policy.json
Create the managed policy by running:
aws iam create-policy --policy-name kubernetes-s3-access --policy-document file://policy.json
Example output:
{
"Policy": {
"PolicyName": "kubernetes-s3-access",
"PolicyId": "ANPAS3DOMWSIX73USJOHK",
"Arn":"arn:aws:iam::04968064045764:policy/kubernetes-s3-access"
,
Note the policy ARN for the next step.
3. Create a Role for S3 Access
Set your AWS account ID to an environment variable with the following command:
ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text)
Set your OIDC identity provider to an environment variable with the following command. Replace the example values with your own values:
OIDC_PROVIDER=$(aws eks describe-cluster --name cluster-name --query "cluster.identity.oidc.issuer" --output text | sed -e "s/^https:\/\///")
Copy the following code block to your computer and replace the example values with your own values.
read -r -d '' TRUST_RELATIONSHIP <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::${ACCOUNT_ID}:oidc-provider/${OIDC_PROVIDER}"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"${OIDC_PROVIDER}:sub": "system:serviceaccount:my-namespace:my-service-account"
}
}
}
]
}
EOF
echo "${TRUST_RELATIONSHIP}" > trust.json
Run the modified code block from the previous step to create a file named trust.json
.
Run the following AWS CLI command to create the role:
aws iam create-role --role-name eks-otomounter-role --assume-role-policy-document file://trust.json --description "Mount s3 bucket to EKS"
Run the following command to attach the IAM policy using the ARN created in the previous section to your role:
aws iam attach-role-policy --role-name eks-otomounter-role --policy-arn=IAM_POLICY_ARN
4. Finally - Install the S3 Mounter!
- Add the helm repo to your repo list:
helm repo add otomount https://otomato-gh.github.io/s3-mounter
- Inspect its arguments in
values.yaml
helm show values otomount/s3-otomount
The values you want to set are in the end:
bucketName: my-bucket
iamRoleARN: my-role
mountPath: /var/s3
hostPath: /mnt/s3data
- Install the chart by providing your own values:
helm upgrade --install s3-mounter otomount/s3-otomount \
--namespace otomount --set bucketName=<your-bucket-name> \
--set iamRoleARN=<your-role-arn> --create-namespace
This will use the default hostPath for the mount - i.e /mnt/s3data
5. Use the mounted S3 bucket in your Deployments.
Here's an example pod definition that provides its container the access to the mounted bucket:
apiVersion: v1
kind: Pod
metadata:
name: sleeper
spec:
containers:
- command:
- sleep
- infinity
image: ubuntu
name: ubuntu
volumeMounts:
- mountPath: /mydata:shared
name: s3data
volumes:
- hostPath:
path: /mnt/s3data
name: s3data
Note the :shared
- it's a mount propagation modifier in the mountPath
field that allows this volume to be shared by multiple pods/containers on the same node.
And that's it! You can now access your bucket. If you've created the pod from our example - you can exec to verify:
kubectl exec sleeper -- ls /mydata
Note: running this on your cluster will cost you a few additional $ for S3 API calls that goofys performs to maintain the mount. So remember to monitor your cloud costs. But you should do that anyway, right?
Happy delivering!
Top comments (16)
Cool setup. Have you tested its speed?
No, speed wasn't a consideration here. Main motivation here was providing an easy and transparent way to upload files and make them accessible to pods. S3 gives users an easy and secure UI for that. Goofys is supposedly quite performant compared to other FUSE implementations (i.e s3fs). But we haven't benchmarked this ourselves.
Thx! Would love to know numbers if you ever do try it :-)
Hi, Can set multiple bucketName?
I need to interact with few s3 buckets for different tasks
hi @oleksiihead
no support for this right now.
to add this one would need to do smthng like:
If you get to do this - please submit a PR.
Amazing approach!
Thx for sharing in details.
I tried to use the sharing method to complete the entire demo, but unfortunately this didn't work. Because the goofys mount directory( /var/s3fs) in daemonset is not the same as the directory I want to share with the host(/var/s3fs:shared);
Is there any configuration I missed?
Daemonset.yaml
What's your node OS? Is mount propagation enabled in the container runtime? See this note here: kubernetes.io/docs/concepts/storag...
I have tried this and the other similar option mentioned in this blog. blog.meain.io/2020/mounting-s3-buc.... In neither case, the mounting to hostPath was successful for the cluster managed by AWS EKS.
Hi @dirai09 , this was originally tested on AWS EKS. I haven't tested it since but it should in theory still work. What is the error you're getting when trying to mount the hostPath?
Also - can you share your config in a gist?
Nice approach. However, you might want to have a look at JuiceFS:
github.com/juicedata/juicefs
That has quite a good performance due to the combination with Redis and it is made with Kubernetes in mind.
Thanks for the wonderful suggestion @randy
In high number of files its fail you due to nature of s3 api for small files the http response will be bigger than files.
I think mounting s3 is a bad idea, if you have enough developing resources its better to write a client for code to connect directly to s3 and cache list of s3 files ... For better performance.
But its a fun thing to do, also cephfs with rados gateway will give you better performance in kubernetes
good to know. not an issue in our case - we have a small number of large files there. And I agree it's not such a great idea in general - both performance wise and because of the hidden complexity. But it solved our specific itch and may help others solve it.
Hi,
I don't think I am able to mount the volumes on the hostPath. Am I missing something here.
I ran into an issue where goofys doesn't reload the content of a small txt file. It updates the timestamp though. Do you know what could be wrong?
I have goofys run inside a container.