DEV Community

shadowb
shadowb

Posted on • Edited on

Automatically Download Email Attachments Using Python. (Python3)

Hi, We all know that if we get hundreds of emails per day, it's hectic to go through each one and download the attachments. So we can download those attachments using the Python programme. Let's dive into the technical part.

Prerequisite

 pip install tqdm
 pip install python-imap

Enter fullscreen mode Exit fullscreen mode

2) Need to turn on two steps verification on gmail and add app password.Go to Manage you account > security > two step verification >add app password.

3) Go to gmail settings and enable IMAP.

Module Required

import imaplib
import email
import os
from datetime import datetime, timedelta
Enter fullscreen mode Exit fullscreen mode

** Creating folder to download those attachments in it.**

folder_name = today.strftime(f"Mail_Attachments")
if not os.path.exists(folder_name):
    os.makedirs(folder_name, mode=0o777)
folder_path = os.path.abspath(folder_name)
Enter fullscreen mode Exit fullscreen mode
 user = "yourmailid@gmail.com"
    password = "copy_pass_from_app_password"
    imap_url = 'imap.gmail.com'
    my_mail = imaplib.IMAP4_SSL(imap_url)
    my_mail.login(user, password)
    my_mail.select('Inbox')
Enter fullscreen mode Exit fullscreen mode

Now we need to specify from which date we need to fetch and download the attachments.
Let's suppose we are taking yesterday(From Yesterday to Today)


    today = datetime.now()
    yesterday = today - timedelta(days=1)
    yesterday_date_string = yesterday.strftime("%Y-%m-%d")

    search_query = f'(X-GM-RAW "has:attachment after: 
    {yesterday_date_string}")'
    result, data = my_mail.search(None, search_query)
    email_ids = data[0].split()
Enter fullscreen mode Exit fullscreen mode

Using for loop we are iterating through all the mails retrieved and checking the attachments if yes we are downloading it below code will show how we are doing it.
Using tqdm to show download progress bar to user (optional).

 for part in msg.walk():
            if part.get_content_maintype() == 'multipart':
                continue
            if part.get('Content-Disposition') is None:
                continue

            filename = part.get_filename()
            if filename:
                filepath = os.path.join(folder_path,filename)
                with open(filepath, 'wb') as f:
                    f.write(part.get_payload(decode=True))

    my_mail.logout()
Enter fullscreen mode Exit fullscreen mode

I hope you guys got it if you have any questions please let me know in comment section.

Top comments (5)

Collapse
 
avilchinsky profile image
Aleksey Vilchinskiy

Could you please explain, how did you jump from the mail ids getting to the part of for loop through the certain message ? There is the intermediate part of code? How and where had you fetched the messages ?

Collapse
 
shadow_b profile image
shadowb

Hi @avilchinsky ,
1) search_query = f'(X-GM-RAW "has:attachment after: {yesterday_date_string}")
This is same as like - has:attachment after:2023-08-08

2) (has:attachment after:2023-08-08 - copy paste this into your gmail search box you will get an idea)

3 ) By passing above search _query we fetched all mails in "result data = my_mail.search(None, search_query)" If you print(data) you get list of mail id's

Please let me know if you still have confusion.

Collapse
 
avilchinsky profile image
Aleksey Vilchinskiy

How did you move from the string
email_ids = data[0].split()
to this one:
for part in msg.walk():
?

Thread Thread
 
shadow_b profile image
shadowb

Hi @avilchinsky

If you see under email_ids loop we did "raw_email = email_data[0][1]"
first element contains response information, and the second element contains the fetched email content. The [0][1] indexing is used to access the second element of the inner tuple.

raw_email now holds the raw content of the email which is often in RFC822 format. This content will be further processed to extract various parts of the email, including attachments.

We parsed the raw email content (raw_email) and create an EmailMessage -> msg = email.message_from_bytes(raw_email)

msg.walk() in part loop starts the iteration over each part of the email message

EmailMessage object that you can use to access various parts of the email, like its headers, body, attachments, and more.

below we just checked "if part.get_content_maintype() == 'multipart' " if part.get('Content-Disposition')" and we check extract the file name and save that attachment.

Thread Thread
 
matias_veron_2c546fb0b6b9 profile image
Matias Veron

Hi @shadow_b, I couldn't understand how you get to declare "msg.walk()" too.
I couldn't find where is the "raw_email" variable. Is it related to the "data" variable which saves all email data?