Hi, We all know that if we get hundreds of emails per day, it's hectic to go through each one and download the attachments. So we can download those attachments using the Python programme. Let's dive into the technical part.
Prerequisite
pip install tqdm
pip install python-imap
2) Need to turn on two steps verification on gmail and add app password.Go to Manage you account > security > two step verification >add app password.
3) Go to gmail settings and enable IMAP.
Module Required
import imaplib
import email
import os
from datetime import datetime, timedelta
** Creating folder to download those attachments in it.**
folder_name = today.strftime(f"Mail_Attachments")
if not os.path.exists(folder_name):
os.makedirs(folder_name, mode=0o777)
folder_path = os.path.abspath(folder_name)
user = "yourmailid@gmail.com"
password = "copy_pass_from_app_password"
imap_url = 'imap.gmail.com'
my_mail = imaplib.IMAP4_SSL(imap_url)
my_mail.login(user, password)
my_mail.select('Inbox')
Now we need to specify from which date we need to fetch and download the attachments.
Let's suppose we are taking yesterday(From Yesterday to Today)
today = datetime.now()
yesterday = today - timedelta(days=1)
yesterday_date_string = yesterday.strftime("%Y-%m-%d")
search_query = f'(X-GM-RAW "has:attachment after:
{yesterday_date_string}")'
result, data = my_mail.search(None, search_query)
email_ids = data[0].split()
Using for loop we are iterating through all the mails retrieved and checking the attachments if yes we are downloading it below code will show how we are doing it.
Using tqdm to show download progress bar to user (optional).
for part in msg.walk():
if part.get_content_maintype() == 'multipart':
continue
if part.get('Content-Disposition') is None:
continue
filename = part.get_filename()
if filename:
filepath = os.path.join(folder_path,filename)
with open(filepath, 'wb') as f:
f.write(part.get_payload(decode=True))
my_mail.logout()
I hope you guys got it if you have any questions please let me know in comment section.
Top comments (5)
Could you please explain, how did you jump from the mail ids getting to the part of for loop through the certain message ? There is the intermediate part of code? How and where had you fetched the messages ?
Hi @avilchinsky ,
1) search_query = f'(X-GM-RAW "has:attachment after: {yesterday_date_string}")
This is same as like - has:attachment after:2023-08-08
2) (has:attachment after:2023-08-08 - copy paste this into your gmail search box you will get an idea)
3 ) By passing above search _query we fetched all mails in "result data = my_mail.search(None, search_query)" If you print(data) you get list of mail id's
Please let me know if you still have confusion.
How did you move from the string
email_ids = data[0].split()
to this one:
for part in msg.walk():
?
Hi @avilchinsky
If you see under email_ids loop we did "raw_email = email_data[0][1]"
first element contains response information, and the second element contains the fetched email content. The [0][1] indexing is used to access the second element of the inner tuple.
raw_email now holds the raw content of the email which is often in RFC822 format. This content will be further processed to extract various parts of the email, including attachments.
We parsed the raw email content (raw_email) and create an EmailMessage -> msg = email.message_from_bytes(raw_email)
msg.walk() in part loop starts the iteration over each part of the email message
EmailMessage object that you can use to access various parts of the email, like its headers, body, attachments, and more.
below we just checked "if part.get_content_maintype() == 'multipart' " if part.get('Content-Disposition')" and we check extract the file name and save that attachment.
Hi @shadow_b, I couldn't understand how you get to declare "msg.walk()" too.
I couldn't find where is the "raw_email" variable. Is it related to the "data" variable which saves all email data?