Ever in need of accessing publicly available S3 data ?
Well, why don't we access them without our credentials, as they are already public & thus no need of credentials, right?
What
AWS S3
Object storage solutions from AWS, which basically allows to store any data, structured, unstructured, videos, files, images any. We basically store data inside folders but it is just that they are called buckets in AWS S3 and that access permissions can be controlled at bucket levels
AWS CLI
Command Line Interface to access all available AWS Services from command line. We can use the CLI, if we just have credentials for user/SSO
Why
People or Teams working on huge volumes of data from Data science, Big Data, Machine Learning backgrounds would have & would be in need of such volume of data for their research & development. With developments in cloud, many such data sets are now available on cloud in AWS S3, and we will see about accessing them here
Publicly made available data are already made available for Public, and hence we can access/download them without having the need to supply AWS credentials
How
- Let us create sample s3 bucket and make it publicly available
- First create a sample bucket(pick a unique name)
- Now add couple of objects/files/images
- Make it public by unchecking the option "Block public access"
- One last step is to give "Read" access to the public bucket for people to access. Sample policy here
{
"Version": "",
"Id": " ",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"AWS": "*"
},
"Action": "s3:ListBucket",
"Resource": "arn:aws:s3:::<<your-public-bucket>>"
}
]
}
- Now let us access the same without providing credentials
- Access to command prompt with AWS CLI installed in it
- Or, if possible, create or use an existing EC2 with AWS CLI for applying the s3 access command
aws s3 ls s3:<<your-public-bucket-name>> --no-sign-request
- From the below screen shot, we can see the plain bucket access raising "Access Denied" but later works on using '--no-sign-request' option along with the cli command
Happy Public Data !!
Top comments (1)
Take note: Making the bucket public WILL get is crawled by bad-actors within days.