What is Amazon S3?
Amazon Simple Storage Service (S3) is a cloud object storage and retrieval service. Objects stored in S3 may include file types such as csv, mp3 and images which are managed using Application Programming Interface (API) through the internet with Hypertext Transfer Protocol Secure (HTTPS).
For example, if you need to create a S3 bucket and load data into an Amazon Redshift cluster using the COPY command, the HTTPS will be 's3:///' shown below:
Amazon Pi Day ๐ฐ - Amazon S3 - Sweet 16
14 March is Amazon Pi day where Amazon will be celebrating 16 years of Amazon S3, you may view the post on Twitter and register to join the learning session.
What is S3 used for?
a) Storage management and monitoring
All objects are stored in S3 buckets and can be shared with teams using prefixes.
With S3 versioning you may store multiple versions of the same within a single bucket.
With cross-region replication, objects can be replicated into one or more destination S3 buckets hosted in other S3 regions within your AWS account.
Storage can be monitored by applying tags to S3 to help monitor costs and inventory via the AWS Cost Allocation Report, AWS CloudTrail for changes to object activities and monitor S3 operational metrics via AWS CloudWatch.
b) Storage analytics and insights
Account snapshots provides an overview of the object storage details and navigating to view S3 Storage Lens dashboard provides granular details for changing metrics overtime that can be exported via csv report on a daily basis.
S3 Storage Class Analysis can analyze all of the objects within a S3 bucket to understand when to transition objects to a different storage class for less frequently used items to a class that is lower cost.
c) S3 Storage classes
There are different S3 storage classes designed for different use cases to store objects in S3 bucket which include the following:
- S3 Standard
The storage class is designed for frequently accessed objects with high durability, performance, availability and low latency for general purpose uses cases such as big data analytics and static websites.
- S3 Intelligent-Tiering
Intelligent tiering optimizes your storage cost to monitor access patterns at the object level. After 30 days of no activity, objects are moved from frequent access tier to infrequent access on an object by object basis to implement cost savings on objects stored in S3. If the objects are accessed frequently they are moved back to the frequent access tier.
- S3 Standard-1A
S3 Standard-1A is for less frequently accessed data that could be accessed quickly when it is required and stored across three availability zones. The use case is for data accessed greater than 30 days.
- S3 One Zone-1A
S3 One Zone-1A is designed for infrequently accessed data that can be accessed quickly but is less durable because it is only stored in a single availability zone.
- S3 Glacier
S3 Glacier Storage Classes are for long-term archival storage. There are 3 classes which are:
a) Amazon S3 Glacier Instant Retrieval
This is storage archival for data that is rarely accessed but can be retrieved quickly within milliseconds and provides a cost savings of up to 68% compared to S3 Standard-1A
Amazon S3 Glacier Instant Retrieval storage class was launched in 2021 you may watch the announcement at AWS re:invent below
AWS re:Invent 2021 - {New Launch} Introducing the Amazon S3 Glacier Instant Retrieval storage class
b) Amazon S3 Glacier Flexible Retrieval (Formerly S3 Glacier)
This class of storage archival is for use cases where data is only accessed 1 to 2 times a year for large datasets that do not require immediate retrieval and can be accessed at no cost within minutes to hours.
c) Amazon S3 Glacier Deep Archive
For objects where data can be stored long-term such as compliance use cases between 7 to 10 years for tax records and financial transactions, this is the cheapest price for storage at AWS and the retrieval time is within 12 hours. Amazon S3 Glacier Deep Archive storage is replicated into three geographically dispersed availability zones for high durability.
d) Amazon S3 on Outposts
Using S3 APIs, allows the retrieval of data objects locally on-premises using AWS Outposts to meet data residency obligations. Amazon S3 Outposts storage class stores data across multiple devices on Outposts for high performance to keep data close to on-premises. AWS Outposts was introduced at the 2019 AWS re:invent you may learn more here.
d) Access management and security
There are three methods to control access to an S3 bucket which are:
IAM policy which provides granular access to users and groups of users
Access Control Lists (ACL) which provides object access to authorized users
Bucket policy can control anonymous HTTP/HTTPS, encryption and IP address range to access a single S3 bucket.
AWS security provides different mechanisms to protect S3 buckets they are discussed below:
S3 block public access ensures that buckets are always private by default. Configuration of the S3 bucket can modify access to include:
- Public access
- Objects can be public
- Buckets and objects are not public
Amazon Macie is an AWS service that uses AI and machine learning to analyze S3 buckets to identify any personally identifiable information (PII) from objects.
IAM Access Analyzer for S3, allows you to analyze the required permission access of S3 resources.
e) Data processing
S3 Object Lambda allows you to use your customized code to modify the data using AWS Lambda Functions to process the output of a returned GET request e.g. redact information.
f) Querying in one place
Amazon S3 has an in-built feature to have the ability to query data from Amazon Analytic services using Amazon Athena and Amazon Redshift Spectrum where SQL queries are made from S3.This removes need to move data onto an analytics platform
g) Data transfer
Amazon offers a range of data transfer and data migration options for use cases that are:
- Hybrid cloud storage via AWS Storage Gateway
- Off-line data transfer via AWS Snow Family
- Online data transfer via AWS DataSync
h) Performance
Amazon S3 has high performance on cloud storage whereby 5,500 requests can be performed to retrieve data.
Free Tier
Amazon S3 provides a 12 month free tier for new AWS accounts and this includes 5 GB of storage
New Feature Release - S3 Batch Replication
A new product feature S3 Batch Replication was launched 8 February with a blog post written by Marcia Villalba in the
AWS News Blog to allow customers to replicate existing objects from the source bucket into one or more destination buckets with same region replication or cross-region replication.
Tutorial: Creating an Amazon S3 bucket
Step 1: Sign into the AWS Management Console using your IAM user credentials that you have already created. If you don't have IAM user credentials you can learn how to create one here.
Login into the AWS Management with your IAM role.
Step 2: Select S3 under the AWS Services navigation search bar
On the S3 main screen click 'Create Bucket'
Step 3: Provide a descriptive name for your S3 folder and scroll to the bottom of your screen and click on 'Create Bucket'
Step 4: Click into the hyperlink of the newly created S3 bucket and create a new folder e.g. called audio files.
Step 5: After the folder is created, click Upload
Step 6: Click 'Add files' or 'Add Folder' if you are uploading a folder of objects and finally click 'Upload'.
Step 7: Receive confirmation of file upload success
After the objects are uploaded into the S3 bucket, a confirmation message indicates the status of success.
Step 8: Click on the S3 bucket folder and inspect that all objects have been upload successfully
Dedicated Amazon S3 - New AWS Training and Certification Badge
New digital badges can be earned through achieving an assessment mark of at least 80. Free training available on the AWS Skills Builder platform with learning paths for block storage focused on AWS Elastic Block Storage and object storage is dedicated to Amazon S3 introduced in the February 2022 AWS News Blog written by Steve Roberts.
Amazon S3 Architecture
Amazon provides reference architecture illustrating use cases for different data sources that can be ingested into Amazon S3 such as audio transcripts, csv and json files for analytics, log files and back files for compliance audits in archived storage.
Data sources can include both on-premises and cloud for S3 buckets. Data stored in S3 can be replicated, protected with permissions and cost optimization is enabled with access to storage classes.
Tutorials using Amazon S3 and data lakehouse approach with Amazon Redshift
You also read my previous tutorials on Amazon S3 and building a data warehouse with Amazon Redshift:
AWS re:Invent 2021 - Amazon S3 key sessions
Deep dive on Amazon S3
Best practices for configuring and managing S3 replication
Deep dive on Amazon S3 security and access management
AWS Backup for S3
Building a data lake on Amazon S3
References
Using the COPY command to load data from Amazon S3
Happy Learning! ๐
Top comments (2)
One stop blog for S3, awesome work
Thank you Anuvindh