Amazon S3 best practices

#aws #serverless #s3 #storage

Below picture shows a pictorial view of Amazon S3 capabilities.

Amazon S3 evolution

a. Since 2006 S3 has shown 80% deduction in price
b. And over time S3 has added several storage classes

S3 New Releases

a. In 2020 S3 has launched new archive in deep archive access tiers, to intelligent tiering storage class, because customers wanted flexibility to automatically moving data to the lowest storage cost offering

In order to further reduce their storage costs, many customers prefer to archive rarely accessed objects directly to S3 Glacier or S3 Glacier Deep Archive. However, this requires you to build complex systems that understand the access patterns of objects for a long period of time and archive them when the objects are not accessed for months at a time.

Today we are announcing two new archive access tiers designed for asynchronous access that are optimized for rare access at a very low cost: Archive Access tier and Deep Archive Access tier. You can opt-in to one or both archive access tiers and you can configure them at the bucket, prefix, or object tag level.

Now with S3 Intelligent-Tiering, you can get high throughput and low latency access to your data when you need it right away, and automatically pay about $1 per TB per month when objects haven’t been accessed for 180 days or more. Already customers of S3 Intelligent-Tiering have realized cost savings up to 40% and now using the new archive access tiers they can reduce storage costs up to 95% for rarely accessed objects.

How does it work:

Once you have activated one or both of the archive access tiers, S3 Intelligent-Tiering will automatically move objects that haven’t been accessed for 90 days to the Archive Access tier, and after 180 days without being accessed to the Deep Archive Access tier. At any time that an object that is in one of the archive access tiers is restored, the object will move to the Frequent Access tier within a few hours and then it will be ready to be retrieved.

Objects in the archive access tiers are retrieved in 3-5 hours and if they are in the deep archive access tier within 12 hours. If you need access to an object in any of the archive tiers faster, you can pay for faster retrieval by selecting in the console expedited retrieval.

b. S3 also launched Amazon S3 outposts to meet customers requirement of keeping their data close to on-premise application

On than note find a quick glimpse of the existing storage classes

o S3 storage classes
o S3 standard – 99.99% availability and 99.99999999% durability
o S3 Infrequently Access – data that is accessed less frequently but need rapid access when needed. Lower fee than S3 but retrieval fee is charged
o S3 One zone - IA (RRS – the old version) – lower cost for infrequently accessed data, and stored in one zone only
o S3 Intelligent Tiering – Designed by Artificial Intelligence, AI decides how often the objects are used and they are moved to the most cost-effective access tier without performance impact or operational overhead
o S3 Glacier – cheaper than on premise data archival
o S3 Glacier Deep Achieve – lowest cost storage class where retrieval time of 12 hours is accepted

Amazon S3 Analytics and Insights

a. S3 Inventory for analysing individual objects within your bucket
b. S3 storage class analysis which analyses access patter in bucket to recommend an optimal storage class for you
c. S3 storage lens (recently launched) provides centralised organisation wide visibility into your S3 storage, usage and activity and also recommendation on how you can optimize your storage
i. 29 metrices, updates daily
ii. Up to 15 months of historical data to analyse
iii. Gives recommendation on how to optimize storage costs

S3 storage use cases

In below example we can see that 43% of storage is in Standard infrequent access, so we can take judgement on moving them to a archival class to same cost

Amazon S3 outlier

Amazon S3 lifecycle

a. Based on rules S3 lifecycle can move objects across storage classes
b. These rules are based on the
i. Date of the object creation
ii. Can be filtered to apply to the

whole bucket
prefix
tagged objects

S3 intelligent Tiering

Data protection with S3

By default, all buckets are private
We can change the policies by a. Bucket policies – bucket policy b. Access control lists – for individual access

Encryption in transit

If the request is https the traffic is going to be encrypted – which is basically encrypted in transit
Between my computer and the server, the traffic will be encrypted, no one will be able to break that in between and understand what I am looking at
Encryption in transit is always achieved by a. SSL/TLS

Encryption at Rest (server side)

IF there’s any word document in server is without encryption anyone have access to the drive will be able to read it
This can be achieved at a. Server side b. Client side
AWS does it at server side
Client side I can encrypt it and upload
Server-side AWS encryption a. S3 Managed Keys / SSE (server-side encryption) S3 – AWS manage to do this; we don’t have to do anything. It is just encrypting the object with a key b. SSE KME (Key Management Service) – Here Amazon and we together encrypt it c. SSE C (customer provided keys) – here we provide the key

Versioning

a. Store all versions of an object
b. Even delete versions maintained
c. Once enabled can’t be disabled, only the bucket can be deleted
d. MFA deletion policy
e. When we enable versioning the size of a file is the sum of all the versions