DEV Community

Shahid.Haider
Shahid.Haider

Posted on

Retrieving Azure Data Lake Gen2 Folder Sizes and Sending Email Reports

In this blog, we will explore a Python script that retrieves folder sizes from Azure Blob Storage and sends an email report with the folder sizes. The script utilizes the smtplib library for email functionality, MIMEMultipart and MIMEText for constructing email messages, and BlobServiceClient from azure.storage.blob for interacting with Azure Blob Storage.

Let's dive into the script and understand each step in detail:

Step 1: Importing the Required Libraries

We begin by importing the necessary libraries for our script. These include smtplib for email functionality and BlobServiceClient from azure.storage.blob for Blob Storage interaction.

import smtplib
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
from azure.storage.blob import BlobServiceClient
Enter fullscreen mode Exit fullscreen mode

Step 2: Defining the Email Sending Function

Next, we define the send_email function, which handles sending emails using the provided SMTP server and credentials. The function takes parameters for the sender email, receiver email, SMTP server, SMTP port, SMTP username, SMTP password, subject, and HTML content of the email. It utilizes the MIMEMultipart and MIMEText classes to construct the email message.

def send_email(sender_email, receiver_email, smtp_server, smtp_port, smtp_username, smtp_password, subject, html_content):
    message = MIMEMultipart()
    message["From"] = sender_email
    message["To"] = receiver_email
    message["Subject"] = subject
    message.attach(MIMEText(html_content, "html"))

    with smtplib.SMTP(smtp_server, smtp_port) as server:
        server.starttls()
        server.login(smtp_username, smtp_password)
        server.sendmail(sender_email, receiver_email, message.as_string())
Enter fullscreen mode Exit fullscreen mode

Step 3: User Input for Configuration

We prompt the user to enter the required information for the script to function correctly. This includes the storage account name, storage account key, container name, sender email address, recipient email address, SMTP server, SMTP port, SMTP username, and SMTP password.

storage_account_name = input("Enter the Storage Account Name: ")
storage_account_key = input("Enter the Storage Account Key: ")
container_name = input("Enter the Container Name: ")
sender_email = input("Enter your email address: ")
receiver_email = input("Enter the recipient email address: ")
smtp_server = input("Enter the SMTP server: ")
smtp_port = int(input("Enter the SMTP port: "))
smtp_username = input("Enter the SMTP username: ")
smtp_password = input("Enter the SMTP password: ")
Enter fullscreen mode Exit fullscreen mode

Step 4: Set the Folder Path

We set the folder_path variable to the desired location in Azure Blob Storage. This will be used to specify the path for retrieving folder sizes.

folder_path = '2023/July/week=2/'
Enter fullscreen mode Exit fullscreen mode

Step 5: Create BlobServiceClient Instance

We create a BlobServiceClient instance using the provided storage account URL and credential. This client will be used to interact with Azure Blob Storage.

blob_service_client = BlobServiceClient(account_url=f"https://{storage_account_name}.blob.core.windows.net",
                                       credential=storage_account_key)
Enter fullscreen mode Exit fullscreen mode

Step 6: Get Container Client

We retrieve the container client for the specified container name. This will allow us to access and traverse the blobs within the container.

container_client = blob_service_client.get_container_client(container_name)
Enter fullscreen mode Exit fullscreen mode

Step 7: Retrieve Folder Sizes

Next, we traverse the folders within the folder_list and compute the total size for each folder. We store the folder sizes in a dictionary for later use.

folder_sizes = {}
total_size_gb = 0
for folder in folder_list:
    current_folder_path = folder_path + folder

    total_size = 0
    for blob in container_client.list_blobs(name_starts_with=current_folder_path):
        total_size += blob.size

    folder_size_gb = total_size / (1024 ** 3)
    folder_sizes[folder] = folder_size_gb

    print(folder, "{:.2f}".format(folder_size_gb), "GB")

    total_size_gb += folder_size_gb

print("Total size:", "{:.2f}".format(total_size_gb), "GB")
Enter fullscreen mode Exit fullscreen mode

Step 8: Create Email Report

We create the HTML table content by iterating over the folder sizes dictionary and constructing the table rows. Each row contains the folder name and its corresponding size.

table_content = ""
for folder, size in folder_sizes.items():
    table_content += f"<tr><td>{folder}</td><td>{size:.2f} GB</td></tr>"
Enter fullscreen mode Exit fullscreen mode

We then construct the HTML table by wrapping the table content in an HTML structure, including table headers for folder name and size.

html_table = f"""
<html>
<head></head>
<body>
    <table>
        <tr>
            <th>Folder</th>
            <th>Size (GB)</th>
        </tr>
        {table_content}
        <tr>
            <td><strong>Total size:</strong></td>
            <td><strong>{total_size_gb:.2f} GB</strong></td>
        </tr>
    </table>
</body>
</html>
Enter fullscreen mode Exit fullscreen mode

Step 9: Send Email Report

We use the send_email function to send the email report. We provide the necessary parameters such as sender email, receiver email, SMTP server, SMTP port, SMTP username, SMTP password, subject, and HTML table content.

subject = "Folder Sizes Report"
send_email(sender_email, receiver_email, smtp_server, smtp_port, smtp_username, smtp_password, subject, html_table)

print("Email sent successfully!")
Enter fullscreen mode Exit fullscreen mode

That's it! The script retrieves the folder sizes from Azure Blob Storage, constructs an email report with the sizes, and sends it to the specified recipient. Feel free to customize the script based on your specific requirements and provide the requested information when prompted.

Remember to install the required dependencies using the following command:

pip install azure-storage-blob
Enter fullscreen mode Exit fullscreen mode

Now you can effectively retrieve folder sizes and receive email reports for monitoring your Azure Data Lake usage.

Top comments (0)