Prabusah

Posted on Nov 1, 2022

Amazon S3 Primer

#aws #s3 #cloudskills #tutorial

Amazon S3 uses object storage type. Object data usually stored in bucket. Prefixes (pseudo folder structure) are used to group objects like folder structure in the user interface (AWS Console) but in reality, the object storage type is still a flat structure.

By default, we can create up to 100 buckets in a AWS account (by submitting service ticket this can be increased up to 1000 buckets). Bucket sizes are unlimited so user do not have to allocate or predetermine bucket size.

Few Bucket Info:

Cannot be transferred to other accounts.
Bucket names are globally unique within entire AWS S3 infrastructure. Once deleted from an AWS account, the name becomes available for reuse by any AWS account after 24 hours.
Bucket names cannot be renamed.
Bucket cannot be nested.
Bucket names can be 3-63 characters long.

Terminologies:

Prefixes - pseudo folder.
Key - name of the object.
Object - A file contains data, metadata (optional) and permissions. All 3 usually provided while uploading a file to bucket.

Bucket can have up to 50 tags and an Object can have up to 10 tags.
Any number of objects can be stored in a bucket.
Each object at max can hold 5 TB data.

Region:

In AWS Console - S3 is a globally viewable service. Bucket creation requires region that decides where the data resides.

Cross Region Replication:

Cross Region replication replicates bucket to other region. Entire bucket or use tags to replicate only the objects with the tags we choose.

Same Region Replication:

Source and Target buckets reside in same region.

Strong data consistency:

Strong read-after-write consistency. After list or write of new object or overwrite of existing object, any subsequent read request receives latest version of the object.

Versioning:

Enables recovery of objects from accidental deletion or overwrite.

GET operation:

Retrieve object. To retrieve a part of object, use the Range HTTP header in GET request.

DELETE operation:

No versioning then permanently deleted.
if version enabled then either permanently delete (Key + Version ID) or create delete marker for the object which can be recovered later (only Key name used without VersionID).
Recovery by removing the delete marker.
Retrieving an object that has delete marker returns 404 NOT FOUND.

PUT operation:

Adds an object to bucket. No partial write. Always completely writes the entire object.
In a single PUT operation - Upload up to 5 GB. (Max object size 5 TB). For >5GB size, then use multipart upload API.

Multipart upload API:

Uploads up to 5 TB data part by part up to 5 GB.

Best practice: To upload more than 100MB size of object, then use multipart upload.
S3 retains all parts on server until complete multipart upload is complete or discontinued. So if upload is incomplete then storage costs occurs for parts of data stored in S3. Use lifecycle rules to clean up incomplete multipart uploads automatically.

Online Data Transfer services:

AWS Data Sync, AWS Transfer family, Amazon Kinesis Firehose (direct to S3), Amazon Kinesis Data Streams (process streaming data).

Offline Data Transfer services:

AWS Snowcone up to 8 TB space.
AWS Snowball Storage optimized 40vCPUs; Compute optimized 52 VCPUs may be rack mounted to build larger installations.
AWS Snowmobile up to 100 PB. 45 foot long container with semi trailer-security personnel.

Hybrid cloud storage services:

On premise applications needs rapid data transfer/access to cloud.
AWS Direct Connect Dedicated network connection (without passing through internet) between on-premise to AWS. Uses VLAN.
AWS Storage Gateway NFS/SMB protocol connect to S3 bucket.

Image by Alexa from Pixabay

DEV Community