DEV Community

Vikram Aruchamy for AWS Community Builders

Posted on • Originally published at stackvidhya.com

How to List Contents of s3 Bucket Using Boto3 Python?

S3 is a storage service from AWS. You can store any files such as CSV files or text files. You may need to retrieve the list of files to make some file operations. You'll learn how to list the contents of an S3 bucket in this tutorial.

You can list contents of the S3 Bucket by iterating the dictionary returned from my_bucket.objects.all() method.

If You're in Hurry...

You can use the below code snippet to list the contents of the S3 Bucket using boto3.

Snippet

import boto3

session = boto3.Session( aws_access_key_id='<your_access_key_id>', aws_secret_access_key='<your_secret_access_key>')



s3 = session.resource('s3')

my_bucket = s3.Bucket('stackvidhya')

for my_bucket_object in my_bucket.objects.all():
    print(my_bucket_object.key)
Enter fullscreen mode Exit fullscreen mode

Output

    csv_files/
    csv_files/IRIS.csv
    df.csv
    dfdd.csv
    file2_uploaded_by_boto3.txt
    file3_uploaded_by_boto3.txt
    file_uploaded_by_boto3.txt
    filename_by_client_put_object.txt
    text_files/
    text_files/testfile.txt
Enter fullscreen mode Exit fullscreen mode

If You Want to Understand Details, Read on…

In this tutorial, you'll learn the different methods to list contents from an S3 bucket using boto3.

You'll use boto3 resource and boto3 client to list the contents and also use the filtering methods to list specific file types and list files from the specific directory of the S3 Bucket.

Installing Boto3

If you've not installed boto3 yet, you can install it by using the below snippet.

Snippet

%pip install boto3
Enter fullscreen mode Exit fullscreen mode

Boto3 will be installed successfully.

Now, you can use it to access AWS resources.

Using Boto3 Resource

In this section, you'll use the Boto3 resource to list contents from an s3 bucket.

Boto3 resource is a high-level object-oriented API that represents the AWS services. Follow the below steps to list the contents from the S3 Bucket using the Boto3 resource.

  1. Create Boto3 session using boto3.session() method
  2. Create the S3 resource session.resource('s3') snippet
  3. Create bucket object using the resource.Bucket(<Bucket_name>) method.
  4. Invoke the objects.all() method from your bucket and iterate the returned collection to get the each object details and print each object name using thy attribute key.

Note: In addition to listing objects present in the Bucket, it'll also list the sub-directories and the objects inside the sub-directories.

Use the below snippet to list objects of an S3 bucket.

Snippet

import boto3
session = boto3.Session(aws_access_key_id='<your_access_key_id>', aws_secret_access_key='<your_secret_access_key>')

#Then use the session to get the resource
s3 = session.resource('s3')

my_bucket = s3.Bucket('stackvidhya')

for my_bucket_object in my_bucket.objects.all():
    print(my_bucket_object.key)
Enter fullscreen mode Exit fullscreen mode

You'll see the list of objects present in the Bucket as below in alphabetical order.

Output

    csv_files/
    csv_files/IRIS.csv
    df.csv
    dfdd.csv
    file2_uploaded_by_boto3.txt
    file3_uploaded_by_boto3.txt
    file_uploaded_by_boto3.txt
    filename_by_client_put_object.txt
    text_files/
    text_files/testfile.txt
Enter fullscreen mode Exit fullscreen mode

This is how you can use the boto3 resource to List objects in S3 Bucket.

Using Boto3 Client

In this section, you'll use the boto3 client to list the contents of an S3 bucket.

Boto3 client is a low-level AWS service class that provides methods to connect and access AWS services similar to the API service. Follow the below steps to list the contents from the S3 Bucket using the boto3 client.

  1. Create Boto3 session using boto3.session() method
  2. Create the boto3 s3 client using the boto3.client('s3') method.
  3. Invoke the list_objects_v2() method with the bucket name to list all the objects in the S3 bucket. It returns the dictionary object with the object details.
  4. Iterate the returned dictionary and display the object names using the obj[key].

Note: Similar to the Boto3 resource methods, the Boto3 client also returns the objects in the sub-directories.

Use the below snippet to list objects of an S3 bucket.

Snippet

import boto3

s3_client = boto3.client('s3', 
                      aws_access_key_id='<your_access_key_id>', 
                      aws_secret_access_key='<your_secret_access_key>' 
                      )

objects = s3_client.list_objects_v2(Bucket='stackvidhya')

for obj in objects['Contents']:
    print(obj['Key'])
Enter fullscreen mode Exit fullscreen mode

You'll see the objects in the S3 Bucket listed below.

Output

    csv_files/
    csv_files/IRIS.csv
    df.csv
    dfdd.csv
    file2_uploaded_by_boto3.txt
    file3_uploaded_by_boto3.txt
    file_uploaded_by_boto3.txt
    filename_by_client_put_object.txt
    text_files/
    text_files/testfile.txt
Enter fullscreen mode Exit fullscreen mode

This is how you can list keys in the S3 Bucket using the boto3 client.

List Contents of A Specific Directory

In this section, you'll learn how to list a subdirectory's contents that are available in an S3 bucket. This will be useful when there are multiple subdirectories available in your S3 Bucket, and you need to know the contents of a specific directory.

You can use the filter() method in bucket objects and use the Prefix attribute to denote the name of the subdirectory.

Filter() and Prefix will also be helpful when you want to select only a specific object from the S3 Bucket.

Use the below snippet to select content from a specific directory called csv_files from the Bucket called stackvidhya.

Snippet

import boto3

session = boto3.Session( aws_access_key_id='<your_access_key_id>', aws_secret_access_key='<your_secret_access_key>')

#Then use the session to get the resource
s3 = session.resource('s3')

my_bucket = s3.Bucket('stackvidhya')

for objects in my_bucket.objects.filter(Prefix="csv_files/"):
    print(objects.key)
Enter fullscreen mode Exit fullscreen mode

You'll see the list of objects present in the sub-directory csv_files in alphabetical order.

Output

    csv_files/
    csv_files/IRIS.csv
Enter fullscreen mode Exit fullscreen mode

This is how you can list files in the folder or select objects from a specific directory of an S3 bucket.

List Specific File Types From a Bucket

In this section, you'll learn how to list specific file types from an S3 bucket.

This may be useful when you want to know all the files of a specific type. To achieve this, first, you need to select all objects from the Bucket and check if the object name ends with the particular type. If it ends with your desired type, then you can list the object.

It'll list the files of that specific type from the Bucket and including all subdirectories.

Use the below snippet to list specific file types from an S3 bucket.

Snippet

import boto3

session = boto3.Session( aws_access_key_id='<your_access_key_id>', aws_secret_access_key='<your_secret_access_key>')

s3 = session.resource('s3')

my_bucket = s3.Bucket('stackvidhya')

for obj in my_bucket.objects.all():
    if obj.key.endswith('txt'):
        print(obj.key)

Enter fullscreen mode Exit fullscreen mode

You'll see all the text files available in the S3 Bucket in alphabetical order.

Output

    file2_uploaded_by_boto3.txt
    file3_uploaded_by_boto3.txt
    file_uploaded_by_boto3.txt
    filename_by_client_put_object.txt
    text_files/testfile.txt
Enter fullscreen mode Exit fullscreen mode

This is how you can list files of a specific type from an S3 bucket.

List Contents From A directory Using Regular Expression

Boto3 currently doesn't support server side filtering of the objects using regular expressions.

However, you can get all the files using the objects.all() method and filter it using the regular expression in the IF condition.

For example, if you want to list files containing a number in its name, you can use the below snippet. To do an advanced pattern matching search, you can refer to the regex cheat sheet.

Snippet

import re 
import boto3

session = boto3.Session(aws_access_key_id='<your_access_key_id>', aws_secret_access_key='<your_secret_access_key>')

s3 = session.resource('s3')

my_bucket = s3.Bucket('stackvidhya')

substring =  "\d"

for obj in my_bucket.objects.all():
    if re.search(substring,  obj.key):  
        print(obj.key)
Enter fullscreen mode Exit fullscreen mode

You'll see the file names with numbers listed below.

Output

file2_uploaded_by_boto3.txt
file3_uploaded_by_boto3.txt
file_uploaded_by_boto3.txt
Enter fullscreen mode Exit fullscreen mode

This is how you can list contents from a directory of an S3 bucket using the regular expression.

Conclusion

To summarize, you've learned how to list contents for an S3 bucket using boto3 resource and boto3 client. You've also learned to filter the results to list objects from a specific directory and filter results based on a regular expression.

If you have any questions, comment below.

You May Also Like

Discussion (0)