shadowb

Posted on Aug 8, 2023 • Edited on Oct 26, 2023

Automatically Download Email Attachments Using Python. (Python3)

#python #automation #selenium #softwareengineering

Hi, We all know that if we get hundreds of emails per day, it's hectic to go through each one and download the attachments. So we can download those attachments using the Python programme. Let's dive into the technical part.

Prerequisite

 pip install tqdm
 pip install python-imap

2) Need to turn on two steps verification on gmail and add app password.Go to Manage you account > security > two step verification >add app password.

3) Go to gmail settings and enable IMAP.

Module Required

import imaplib
import email
import os
from datetime import datetime, timedelta

** Creating folder to download those attachments in it.**

folder_name = today.strftime(f"Mail_Attachments")
if not os.path.exists(folder_name):
    os.makedirs(folder_name, mode=0o777)
folder_path = os.path.abspath(folder_name)

 user = "yourmailid@gmail.com"
    password = "copy_pass_from_app_password"
    imap_url = 'imap.gmail.com'
    my_mail = imaplib.IMAP4_SSL(imap_url)
    my_mail.login(user, password)
    my_mail.select('Inbox')

Now we need to specify from which date we need to fetch and download the attachments.
Let's suppose we are taking yesterday(From Yesterday to Today)


    today = datetime.now()
    yesterday = today - timedelta(days=1)
    yesterday_date_string = yesterday.strftime("%Y-%m-%d")

    search_query = f'(X-GM-RAW "has:attachment after: 
    {yesterday_date_string}")'
    result, data = my_mail.search(None, search_query)
    email_ids = data[0].split()

Using for loop we are iterating through all the mails retrieved and checking the attachments if yes we are downloading it below code will show how we are doing it.
Using tqdm to show download progress bar to user (optional).

 for part in msg.walk():
            if part.get_content_maintype() == 'multipart':
                continue
            if part.get('Content-Disposition') is None:
                continue

            filename = part.get_filename()
            if filename:
                filepath = os.path.join(folder_path,filename)
                with open(filepath, 'wb') as f:
                    f.write(part.get_payload(decode=True))

    my_mail.logout()

I hope you guys got it if you have any questions please let me know in comment section.

Top comments (5)

Aleksey Vilchinskiy • Aug 9 '23

Could you please explain, how did you jump from the mail ids getting to the part of for loop through the certain message ? There is the intermediate part of code? How and where had you fetched the messages ?

shadowb • Aug 9 '23

Hi @avilchinsky ,
1) search_query = f'(X-GM-RAW "has:attachment after: {yesterday_date_string}")
This is same as like - has:attachment after:2023-08-08

2) (has:attachment after:2023-08-08 - copy paste this into your gmail search box you will get an idea)

3 ) By passing above search _query we fetched all mails in "result data = my_mail.search(None, search_query)" If you print(data) you get list of mail id's

Please let me know if you still have confusion.

Aleksey Vilchinskiy • Aug 10 '23

How did you move from the string
email_ids = data[0].split()
to this one:
for part in msg.walk():
?

shadowb • Aug 12 '23

Hi @avilchinsky

If you see under email_ids loop we did "raw_email = email_data[0][1]"
first element contains response information, and the second element contains the fetched email content. The [0][1] indexing is used to access the second element of the inner tuple.

raw_email now holds the raw content of the email which is often in RFC822 format. This content will be further processed to extract various parts of the email, including attachments.

We parsed the raw email content (raw_email) and create an EmailMessage -> msg = email.message_from_bytes(raw_email)

msg.walk() in part loop starts the iteration over each part of the email message

EmailMessage object that you can use to access various parts of the email, like its headers, body, attachments, and more.

below we just checked "if part.get_content_maintype() == 'multipart' " if part.get('Content-Disposition')" and we check extract the file name and save that attachment.

Matias Veron • Aug 16 '24

Hi @shadow_b, I couldn't understand how you get to declare "msg.walk()" too.
I couldn't find where is the "raw_email" variable. Is it related to the "data" variable which saves all email data?

DEV Community

Automatically Download Email Attachments Using Python. (Python3)

Top comments (5)

Read next

ImageGoNord: Balancing Open Source Freedom with Ethical Challenges

Displaying Python Script Outputs on Conky Panels

Level Up Your Python Skills with These Fun Coding Games! 🎮🐍

How to disable GIL (Global Interpreter Lock) in Python 3.13