Getting S3 bucket size different ways

#aws #s3

In this post I will show a few methods of how to check AWS S3 bucket size for bucket named my-docs. You can change this name to any existing bucket name which you have access to.

S3 CLI

All you need is AWS CLI installed and configured.



aws s3 ls --summarize --human-readable --recursive s3://my-docs

It prints output like this one:




2019-03-07 12:11:24   69.7 KiB 2019/01/file1.pdf
2019-03-07 12:11:20  921.4 KiB 2019/01/file2.pdf
2019-03-07 12:11:16  130.9 KiB 2019/01/file3.pdf

Total Objects: 310
Total Size: 121.7 MiB

The output looks similar to bash ls command.

You can than cut total size from the output, e.g.:



aws s3 ls --summarize --human-readable --recursive s3://my-docs \
| tail -n 1 \
| awk -F" " '{print $3}'

tail will get last line of the output
awk will tokenize line by space and print third token which is bucket size in MB.

CloudWatch metric



aws cloudwatch get-metric-statistics --namespace AWS/S3 \
| --start-time 2019-07-07T23:00:00Z \
| --end-time 2019-10-31T23:00:00Z \
| --period 86400 \
| --statistics Sum \
| --region us-east-1 \
| --metric-name BucketSizeBytes \
| --dimensions Name=BucketName,Value="my-docs" Name=StorageType,Value=StandardStorage \
| --output text \
| sort -k3 -n | tail -n 1 | cut -f 2-2

What is happening in the above command?

AWS CLI CloudWatch method get-metric-statistics prints metric data points from range start-time till end-time with period every 86400 sec (which is 24h, the period depends on the time frame, see docs for more info).

The bucket name is my-docs and we want output to be printed in plain text (we could as well choose json).

The output printed would look like:



DATAPOINTS  127633754.0 2019-10-17T23:00:00Z    Bytes
DATAPOINTS  127633754.0 2019-08-13T23:00:00Z    Bytes
DATAPOINTS  127633754.0 2019-07-07T23:00:00Z    Bytes
DATAPOINTS  127633754.0 2019-10-03T23:00:00Z    Bytes

The third column is a timestamp. Data is unordered and we are interested in the most current size of the bucket so we should sort output by this column: sort -k3 -n will sort output by 3rd column.

Finally, we want to take second column which is bucket size in bytes.

tail -n 1 takes last line of the output
cut -f 2-2 will cut the line from 2nd to 2nd column, in other words it takes only the column we are interested in.

This method of fetching bucket size is error prone because data points are present only for time frames when data was actually changed on s3 so if you have not modified data through last month and you request metrics from this period you won't get any.

AWS S3 job (inventory)

This is a feature provided by AWS - an inventory report. It allows to configure a scheduled job which will save information about S3 bucket in a file in another bucket. The information, among others, can contain the size of objects in source bucket.

This instruction explains how to configure AWS S3 inventory manually on AWS console.

To wrap up, the first option looks like the easiest one from command line, but other options are worth to know too.

They may serve better for particular use case.

E.g. if you wanted to see how bucket size changed over time period the 2nd method would be more suitable, but if you'd like to get a report with bucket size on regular basis the third seems easier to implement. You could listen on new report object in the second bucket and trigger lambda function on object-created event to process the report (maybe notify user by email).