In this blog, we will learn how to use S3 as a filesystem on EC2 Linux machine.
Let's start!
1) Create an EC2 Linux (I have used Ubuntu in this demo) instance
Keep everything as default and add the below user data script to install awscli and s3fs utlity from advance section of wizard
sudo apt-get update -y
sudo apt-get install awscli -y
sudo apt-get install s3fs -y
2) Create an IAM user for s3fs
3) Give the user a unique name and enable programmatic access
Set permission --> create a new policy
Select the service as S3 and include below access levels
Give the policy a unique name and click Create policy
Once the policy is created, go back to the IAM tab and hit refresh so that newly created policy is included in the list
, filter by policy name and hit the enable checkbox to add the policy to our IAM user.
Hit create user
Once the user is created, download the credentials. We are going to use it later.
4) Login to your Ec2 Instance
Go to your home directory and run below commands to create a new directory and to generate some sample files
mkdir /home/ubuntu/bucket; cd $HOME/bucket ;touch test1.txt test2.txt test3.txt
Next step is to create an S3 bucket.
5) Go to S3 service and create a new bucket
give it a unique name and leave reast of the settings as default.
Block public access to this bucket should be enabled by default
Hit create bucket.
6) Once the bucket is created, go to the ssh session and configure our AWS credentails for authentication using the IAM account that we have created.
Use the command
aws configure
and provide the credential details that we have downloaded before
7) Now run the below command to sync local directory with the S3 bucket
aws s3 sync path_on_filesystem s3://bucketname
For example,
aws s3 sync /home/ubuntu/bucket s3://test-s3fs-101
8) create the credential file for s3fs
s3fs supports the standard AWS credentials file stored in ${HOME}/.aws/credentials. Alternatively, s3fs supports a custom passwd file.
The default location for the s3fs password file can be created:
using a .passwd-s3fs file in the users home directory (i.e. ${HOME}/.passwd-s3fs)
file should have the below content:
$AWS_ACCESS_KEY_ID:$AWS_SECRET_KEY_ID
You can run the below command as well:
echo "AKIAQSCIQUH6XXYQMGDA:T5qM7rZmSaU3p/Y0xmuZyWv1/KUnT0Oc58sdCJ3t" > ${HOME}/.passwd-s3fs;
chmod 600 ${HOME}/.passwd-s3fs
9) Now you can run the command to mount S3 bucket as a filesystem
sudo s3fs bucketname path -o passwd_file=$HOME/.passwd-s3fs,nonempty,rw,allow_other,mp_umask=002,uid=$UID,gid=$UID -o url=http://s3.aws-region.amazonaws.com
,endpoint=aws-region1,use_path_request_style
for example:
sudo s3fs s3fs-test-101 /home/ubuntu/bucket -o passwd_file=$HOME/.passwd-s3fs,nonempty,rw,allow_other,mp_umask=002,uid=1000,gid=1000 -o url=http://s3.ca-central-1.amazonaws.com
,endpoint=ca-central-1,use_path_request_style
10) Once it is mounted successfully, you can verify by running the command
mount|grep s3fs
11) Add the entry in fstab using the below command so that the changes become persistent after the server reboot as well:
bucketname directoryonfs fuse.s3fs _netdev,allow_other 0 0
For example:
s3fs-test-101 /home/ubuntu/bucket fuse.s3fs _netdev,allow_other 0 0
12) Now the moment of truth, go to your S3 bucket and hit refresh, you should see the files that were present in your file system
13) Let's now verify whether it's getting synced properly after a object delete/addition
Go to your S3 bucket, and upload a new file
Go to your ssh session and do ls in the same directory
Eureka! The file that you just uploaded in your S3 bucket appears in your FileSystem.
Same way you can test the delete file operation. And it works both ways i.e if you perform any file operation on your filesystem, it will sync to your S3 bucket as well.
Feel free to checkout the below hands-on demo of what we have learned so far:
References:
https://github.com/s3fs-fuse/s3fs-fuse
https://aws.amazon.com/
https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html
Limitations
Generally S3 cannot offer the same performance or semantics as a local file system. More specifically:
- random writes or appends to files require rewriting the entire object, optimized with multi-part upload copy
- metadata operations such as listing directories have poor performance due to network latency
- non-AWS providers may have eventual consistency so reads can temporarily yield stale data (AWS offers read-after-write consistency since Dec 2020)
- no atomic renames of files or directories
- no coordination between multiple clients mounting the same bucket
- no hard links inotify detects only local modifications, not external ones by other clients or tools
Top comments (4)
Very details article. Would like to give try similar approach. Thanks ๐๐ป๐๐ป
Thank you so much!
For any new reader:-
You have to edit /etc/fstab and add step 11 there.