DEV Community

Cover image for Blog -3-Data Engineering – AWS S3 space monitoring – Storage Lens
Sanjeeb Mohapatra
Sanjeeb Mohapatra

Posted on • Updated on

Blog -3-Data Engineering – AWS S3 space monitoring – Storage Lens

Data Engineering – AWS S3 cost monitoring – Storage Lens

Amazon S3 is an object storage service and one of the most popular services in AWS which offers of industry-leading scalability, data availability, security, and performance. Organizations can store and retrieve any amount of data from anywhere.

If an organization is using aws for their cloud service, AWS S3 is one of prefer storage solution. Some of the use cases of S3 are:

  1. Build an Enterprise Data Lake
  2. Create a Disaster Recovery System for back up and restore data.
  3. Archive cold data for a long period to meet regulatory requirements
  4. Host a static website.
  5. Integrated with many cloud native solutions to provide storage option.

While you are store unlimited amount of data in S3, it is very important to monitor the storage of S3 and number of objects in S3 buckets. At end of the day every object storage occurred a cost. Organizations may not be able to quantify the storage cost when they have GB, TB data but when the data volume grow to PB’s S3 cost will be high.

For example, when you store 10 PB data (for big enterprise scale applications like data lake, lake house etc) , you have to give 220K USD for UK region for storage. So it is very important to understand the usage of S3 bucket.

Image description

AWS S3 have a feature called “Storage lens” where you can create your own custom dashboard and monitor the usage of S3 objects. In this blog, we will create a dashboard using storage lens and see how it work.
To do the same.

  1. Click on S3 in AWS management console. You can see the overall utilization of all your S3 buckets.

Image description

  1. To create a Dashboard, click on the storage lens in the left panel. Click on Dashboard
  2. Click on Create Dashboard
  3. Give the below details
  • Dashboard Name: in this case we give the name as s3-bucket-usage-monitor
  • Home Region : Select the appropriate region, for us it is London region which is eu-west-2
  • Select Status as enable so that we can see the status of the dashboard.

Image description

  1. For Dashboard Scope, if you are having objects across regions, you can select the region, in our case we ONLY select London region as all our objects are store in London region and include all buckets in the region.

Image description

  1. In the metrics section, select the Free metrices. A lot of key metrices are available under free metrices and that is more than enough to monitor usages of S3 buckets.

Image description

  1. You can export these metrices to an S3 path for further analytical usages, for our case we disabled this option. Finally click create dashboard.

Image description

  1. It will take 48 hours to have the charts ready.

Image description

By default, AWS created a lens for you ( which cover all regions), if you want to use the default dashboard, that is fine as well. For any custom requirements like specific region or any rule we can create custom dashboard as well. Sample charts from default dashboard is :

Image description

Quick Tips:

  1. Once you understood the usages, you can see which critical data is require for your use case and access pattern. If there are buckets and folders access pattern is unknown, better to use S3 intelligent tiering for cost saving.
  2. Many organizations and enable bucket versioning so that they can avoid the accidental deletion of the object, however ONLY critical data objects (which is difficult to recreate, scripts folder) versioning make sense. But use cases like Data lake when you are getting source data and processed the data and move to archive, you really DO NOT need versioning to the staging bucket.
  3. For files which needs to be stored for long term, better to define a life cycle management to Glacier storage, this can be set it up using Life cycle management policy.

Top comments (1)

Collapse
 
shreya123 profile image
Shreya • Edited

I thoroughly enjoyed reading your blog post on AWS S3 space monitoring with Storage Lens. Data engineering and storage management are crucial aspects of modern cloud architecture, and your insights into utilizing AWS tools are enlightening.

Storage Lens seems like a powerful tool to keep a close eye on S3 usage and optimize storage costs effectively. Your step-by-step guide and explanations make it easier for readers to implement these monitoring practices in their own AWS environments.

I particularly appreciated the way you explained the importance of proactive monitoring for avoiding potential issues and optimizing resources. It's evident that staying on top of S3 space utilization can significantly contribute to a well-architected and cost-efficient AWS infrastructure.

Looking forward to more insightful posts from you! Keep up the excellent work!