AWS S3 is an object-based serverless storage service by Amazon web services which is much faster than hard drive file systems and block storage approaches to save data. Serverless means the storage is hosted on the cloud where you don’t have to configure the server with storage space restriction, it gets expanded dynamically with usage.
In this article, We will discuss What is AWS S3 and 5Ws for using it
AWS S3 bucket is a public cloud storage unit on S3 (Simple storage service). The user account can hold multiple S3 buckets for storing folders and data in the form of objects but the bucket names should be unique across all AWS accounts just like a domain name. The S3 bucket names should be DNS compliant which means it shouldn’t include special characters in its name. What is AWS S3 and Why to use it?
Here we will discuss the Top 10 features of AWS S3.
A. Security on Server Side
For server-side security Server-side encryption is used which has the following 3 options:
In this feature, S3 will use AES-256 encryption algorithm to secure the data and handles the keys itself.
In this feature, S3 will use AES-256 encryption algorithm to secure the data and use envelope key management service to encrypt the keys which will allow you to manage keys on your own.
In this feature, S3 will use AES-256 encryption algorithm to secure the data and customer provides the keys (you manage the keys).
B. Security in transit
By default, SSL encryption is used for in-transit data and all HTTP requests.
C. Security on Client Side
The data is first encrypted on client-side and then uploaded to AWS S3.
Lifecycle management is a service to automatically manage data objects after living up for a predetermined life cycle. The set of rules written in life cycle management can automatically delete or move the targeted data to a different storage class after a determined time period.
Versioning is used to maintain versions of data and to record the actions done by users over it. Versioning is disabled by default; the root user can enable it. Once you have enabled the versioning it can only be suspended which means the created versions will not be deleted.
For prohibiting others on a development team to delete data from S3 bucket you can enable MFA tokens but in order to do this versioning should be turned on mandatorily. Enabling MFA token will allow the only root user to delete data from S3 buckets on successfully matching the token.
ACL is a simple permission template or legacy method to manage permissions over objects and S3 buckets.
Bucket policies are JSON documents which allow developers to write thorough control access procedures.
Cross-region replication is replicating the data present in one data centre to another data centre situated at a different geographical location. The replication of data can be done across accounts as well as S3 buckets.
In case of natural calamities, the software solution will not shut down it will start fetching data from the data centre located in a different region.
*Meet compliance requirements
Although AWS S3 stores your data across multiple geographically distant Availability Zones by default compliance requirements might dictate that you store data at even greater distances. Cross-Region replication allows you to replicate data between distant AWS Regions to meet compliance requirements.
If your customers are in two geographic locations, you can minimize latency in accessing objects by maintaining object copies in AWS Regions that are geographically closer to your users.
*Increase operational efficiency
If you have computed clusters in two different AWS Regions that analyse the same set of objects, you might choose to maintain object copies in those Regions.
AWS S3 Transfer Acceleration enables fast, easy, and secure transfers of files over long distances between your client machine and an S3 bucket. Transfer Acceleration takes advantage of AWS CloudFront’s globally distributed edge locations. As the data arrives at an edge location, data is routed to Amazon S3 over an optimized network path. Using Transfer Acceleration, additional data transfer charges may apply. Only the S3 bucket owners can enable transfer acceleration to leverage maximum bandwidth capabilities of their internet connection for frequently uploading Gigabytes to Terabytes of data.
The data uploaded as objects on AWS S3 bucket generates unique URL to access it and it is accessible to people according to access level permissions (Public, Private or limited access). When the AWS user wants to provide read and write access to someone over an object for a limited time then they can create pre-signed URLs which will be signed by their user id and will provide access for the predetermined time period.
AWS S3 has the following six storage classes for which the availability is inversely proportional to pricing.
Standard storage class is fastest and most expensive as the data in it is replicated across at least three availability zones. This storage class is best for storing data that is being accessed almost every time because here the latency is in a couple of microseconds.
The standard IA are the same as standard storage class in terms of performance but the bundled services are lesser hence it is cheaper.
*One Zone IA
In One Zone IA, the objects are only stored in one availability zone to reduce the asking price hence the latency is little more than the standard storage class. The data objects which are less frequently used like once in a month should be stored in this storage class.
The data that’s older than a month and which is hardly accessed by anyone should be moved to Glacier for reducing the storage cost to a fraction.
Glacier archive is used to store data that needs to be stored for a year or more. Usually, this type of data is enterprise operations data or the data to be maintained for legal compliance. The Glacier archive is cheapest amongst all of its peer storage classes and the data retrieval time is in hours.
Intelligent tiering uses machine learning to analyse the objects to be placed in most cost-efficient storage class. The least accessed objects tend to be moved into glacier or glacier archive.
For more insights, you can refer to the performance chart by AWS.
The Solutions architect incorporates AWS S3 in solution architecture and on deployment he/she directs the DevOps team to use it for storing the data.
When your project has a large amount of data which is increasing at an unpredictable rate.
A project where large amounts of sensitive data are being generated and accessed should use AWS S3 to reliably manage access over data and protect it. Usually, the scale of these projects is enterprise-level which cannot bear downtime.