Amazon S3, key things to know

#aws #s3

Getting started with Amazon S3 is quite straightforward, but to get most out of it, there are some key things to know.

What is S3?
Amazon Simple Storage Service (Amazon S3) is an object storage service. Object storage is a data storage architecture for large amounts of unstructured data. Data is stored as objects, with associated metadata and a unique identifier for easy access and retrieval. Object storage scales seamlessly, allowing storage of unlimited amounts of data. It is commonly used as storage for unstructured data (music, video, images), backup and archiving, as data store for cloud native applications, building data lakes etc.

Buckets and objects
Data within Amazon S3 is stored as objects within buckets. An object is a file and any metadata that describes the file. A bucket is a container for objects. To store your data in Amazon S3, you first create a bucket by specifying a bucket name and AWS Region. Then, you upload your data to that bucket as objects in Amazon S3. Each object has a key (or key name), which is the unique identifier for the object within the bucket.

Amazon S3 has a flat structure instead of a hierarchy like in a file system. However, for organizational simplicity, S3 supports the folder concept as a means of grouping objects. It does this by using a shared name prefix for objects (that is, objects have names that begin with a common string). For example, you can create a folder named “documents” and store an object named “mydoc.txt” in it. The object is then stored with the key name “documents/mydoc.txt”, where “documents/” is the prefix.

Storage classes
Each object in Amazon S3 has a storage class associated with it. The storage class affects your data availability and cost. You should select a proper storage class based on your data access patterns. If you are running your data lakes on AWS, understanding the S3 Storage classes is essential in order to optimally manage your costs and access to the S3 objects. The inability to do that might lead to a huge bill as more and more data is loaded into your data lake.

Storage class for frequently accessed objects

Use the S3 Standard class when an object is accessed more than once a month and millisecond access is needed.

Storage class for data with changing or unknown access patterns

Use the S3 Intelligent-Tiering class when the access pattern is changing or unknown. This class will optimize storage costs by automatically moving data to the most cost-effective access tier, without performance impact or operational overhead.

Storage classes for infrequently accessed objects

Use S3 Standard-IA and S3 One Zone-IA storage classes for long-lived and infrequently accessed data, that require millisecond access. Amazon S3 charges a retrieval fee for these objects, so they are most suitable for infrequently accessed data.

Storage classes for archiving objects

For archiving data that is rarely accessed and requires milliseconds retrieval, use S3 Glacier Instant Retrieval. Compared to S3 Standard-IA, this class offers storage costs savings but higher data access costs.

For archiving data that does not require immediate access, use S3 Glacier Flexible Retrieval with retrieval in minutes. To save even more, use the lowest cost storage class S3 Glacier Deep Archive class with a retrieval within 12 hours.

Regions
Your data resides in a single AWS Region. You specify an AWS Region when you create your Amazon S3 bucket. Your objects are automatically stored across multiple devices spanning a minimum of three Availability Zones, each separated by miles across an AWS Region.

Managing Lifecycle
Amazon S3 Lifecycle manages objects so that they are stored cost effectively throughout their lifecycle. There are two available options:

Transition actions, These actions define when objects transition to another storage class. For example, transition to a more cost efficient class after some period of time.
Expiration actions, These actions define when objects expire and Amazon S3 will delete expired objects on your behalf.

Pricing
You only pay for what you use. In S3 you’re paying for the storage, the number of requests, data transfer and additional features that you use.

You are paying for the storage, and you are charged based on the object’s size, how long the object is stored during the month, and the storage class.

You’re paying for requests made to your S3 buckets and objects. Only DELETE and CANCEL requests are free.

You’re paying for all the data transfer, except: data transferred in S3 from the Internet, data transferred out to the internet for the first 100GB per month, data transferred from S3 bucket to any AWS service in the same AWS Region, data transferred out to Amazon CloudFront.

With AWS Free Tier you can get started with Amazon S3 for free (5GB storage, 20,000 GET Requests; 2,000 PUT, COPY, POST, or LIST Requests; and 100 GB of Data Transfer Out).

More details on pricing can be found here.