Amazon S3 - Basics

#aws

Amazon S3 is a highly available data storage that can be used for example for backups, app hosting, data lakes and hosting of static websites. With S3 you can store objects in buckets, objects are similar to files and buckets can be thought of as directories. A bucket must have a globally unique name and is tied to a specific region.

Objects have a unique key as an identifier and are composed of the folder name and the object name:

ss3://EXAMPLE-BUCKET.s3.us-west-2.amazonaws.com/picture/cat.png

Here "picture/cat.png" would be the unique key.

🔒Security

You can control access to your S3 buckets by using S3 bucket policies, IAM roles or IAM policies. If your data in the S3 buckets isn`t supposed to be public, check that the bucket settings to block public access are configured correctly. To encrypt your data, you can enable server-side encryption, which encrypts your data when it gets stored in the bucket. Alternatively you could use client-side encryption, meaning that you encrypt your data before you load it into the S3 bucket.

Versioning

When you enable versioning on your buckets, a new version will be created when you upload a file to a destination where a file already exists. It's recommended to enable it so you don't accidentally overwrite files you still need.

Replication

With replication you can crate copies of objects across S3 buckets asynchronously. This can be replication inside of the same or a different region, "Same region replication" or "Cross region replication".

💾 Storage classes

There are different S3 storage classes available.

1. General purpose

The general purpose class offers an availability of 99,99% and is fitting for frequently accessed data. It offers a low latency and a high throughput. Use cases are e.g. cloud apps, dynamic websites or mobile apps.

2. Infrequent access

With the infrequent access class you have an availability of 99,9%. It should be used for less frequently accessed data, that still requires a rapid access to the data. Its cheaper than "general purpose", but there's cost for the retrieval of data. It's useful for e.g. backup and disaster recovery.

3. One Zone-infrequent access

Fitting when you access your data infrequently and only want your data in one Availability Zone(AZ). The availability is 99,5%. It can be used for storing secondary backup copies of on-premise data or re-creatable data.

4. Glacier storage classes

The purpose of the glacier storage classes is archiving and backup. You need to pay for storage and object retrieval.

Glacier instant retrieval offers a fast retrieval of data(milliseconds) that is rarely accessed.

With Glacier flexible retrieval you have flexible retrieval times ranging from minutes to hours.

Glacier deep archive offers a long term storage with a retrieval time of 12 hours.

5. Intelligent Tiering

Intelligent Tiering automatically moves data to the most cost-effective tier based on the access frequency. You are charged an object monitoring and automation charge, but you don't need to pay for data retrieval. When activated, your objects are moved automatically to these tiers:

Frequent Access Tier: default
Infrequent Access Tier: objects are not accessed for 30 days
Archive Instant Access Tier: objects are not accessed for 90 days

You can also configure additional tiers for the Intelligent Tiering: