DEV Community

Cover image for Declutter your Gmail inbox with Python: A Step-by-Step Guide
Joey Miller
Joey Miller

Posted on • Originally published at joeeey.com

Declutter your Gmail inbox with Python: A Step-by-Step Guide

Emails are an important part of many of our lives - both personally and professionally. Staying on top of your inbox can be a daunting task. My matter how hard I try, inevitably my Gmail begins overflowing with countless unread messages.

In this guide we will explore how Python can be utilized to effortlessly sort through your inbox, allowing you to regain control.

Note: The purpose of this post isn't to detail a fully-automated AI that can clean our inboxes unsupervised. Rather, the goal is to introduce you to the tools needed to supplement your efforts when cleaning your inbox.

Installing dependencies

Ensure you have python3 and pip installed.

I encourage you to install the dependencies into a virtual environment.

Navigate to your project directory and run the following:

python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib pandas
Enter fullscreen mode Exit fullscreen mode

Getting Google API access

Before we can get started, we need to register our application with Google so we can access user data.

We will follow the official instructions to create an OAuth "Desktop app".

  1. Go to Credentials
  2. Click Create Credentials > OAuth client ID.
  3. Click Application type > Desktop app.
  4. In the Name field, type a name for the credential. This name is only shown in the Google Cloud console.
  5. Click Create. A OAuth client created popover appears, showing the client details. Click 'Download JSON' and save the file as credentials.json to your project directory.

Analyzing your inbox

In this simple example, we will focus on creating a Python script that gives a breakdown of the most common senders in our inbox.

Create a Python file called gmail_organizer.py in your project directory.

First, let's add the shebang and imports.

#!/usr/bin/env python3
from __future__ import print_function
import os.path
import pandas as pd
import re
from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials
from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient.discovery import build
Enter fullscreen mode Exit fullscreen mode

Then let's create the authentication function. This uses the credentials.json file to allow us to authenticate on behalf of a user. Once a user has authenticated a token.json will be created in the project directory. This matches the sample code provided by Google.

# If modifying these scopes, delete the file token.json.
SCOPES = ['https://www.googleapis.com/auth/gmail.readonly']
def get_creds():
    creds = None
    # The file token.json stores the user's access and refresh tokens, and is
    # created automatically when the authorization flow completes for the first
    # time.
    if os.path.exists('token.json'):
        creds = Credentials.from_authorized_user_file('token.json', SCOPES)
    # If there are no (valid) credentials available, let the user log in.
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(
                'credentials.json', SCOPES)
            creds = flow.run_local_server(port=0)
        # Save the credentials for the next run
        with open('token.json', 'w') as token:
            token.write(creds.to_json())
    return creds
Enter fullscreen mode Exit fullscreen mode

Now, we create the functions get_inbox_emails(..) and process_email_metadata(..) - these will be doing most of the heavy lifting.

email_metadata = []
def process_email_metadata(request_id, response, exception):
    global email_metadata
    message_id = response.get('id')
    headers = response.get('payload').get('headers');
    if(headers is not None):
        for header in headers:
            if header['name'] == "From":
                username, domain = re.match(
                    r'(?:.*<)?(.*)@(.*?)(?:>.*|$)', header['value']
                ).groups()
                email_metadata.append({
                    'message_id':message_id,
                    'username':username,
                    'domain':domain})
                break
def get_inbox_emails(service):
    # Call the Gmail API
    response = service.users().messages().list(
            userId='me',
            labelIds=['INBOX'],
            maxResults=5000
    ).execute()
    # Retrieve all message ids
    messages = []
    messages.extend(response['messages'])
    while 'nextPageToken' in response:
      page_token = response['nextPageToken']
      response = service.users().messages().list(
              userId='me',
              labelIds=['INBOX'],
              maxResults=5000,
              pageToken=page_token
      ).execute()
      messages.extend(response['messages'])
    # Retrieve the metadata for all messages
    step = 100
    num_messages = len(messages)
    for batch in range(0, num_messages, step):
        batch_req = service.new_batch_http_request(callback=process_email_metadata)
        for i in range(batch, min(batch + step, num_messages)):
            batch_req.add(service.users().messages().get(
                userId='me',
                id=messages[i]['id'],
                format="metadata")
            )
        batch_req.execute()
Enter fullscreen mode Exit fullscreen mode

Let's break down what these functions accomplish:

  1. Create a gmail service class.
  2. Retrieve all message ids / list all messages. Gmail only allows listing up to 5000 results at one time, so we have to keep requesting more until there is no nextPageToken in the response.
  3. Retrieve the metadata for all messages. Gmail does not provide any way to retrieve these details when listing emails, so we need to iterate over each of the message id's we found in Step 2. For performance; in each request we ask Gmail to return the metadata of up to 100 emails.
  4. For each email metadata we receive back, the callback process_email_metadata(..) is called. This is where we process our data. In this example, I process the From: field and apply some regex to extract the email username and domain name. This will allow us to find the most common senders in my inbox.

Now finally let's create the script entrypoint (calling the functions we've already made above).

def main():
    creds = get_creds()
    service = build('gmail', 'v1', credentials=creds)
    get_inbox_emails(service)
if __name__ == '__main__':
    main()
Enter fullscreen mode Exit fullscreen mode

Printing results

Running the code above will return nothing. We need to process the data and display it to the user. We can use Pandas to easily report a descending list of email usernames and domains.

We've already done the work to process this data in process_email_metadata(..), so all we need to do is add the following lines to main() below get_inbox_emails(service):

    # Print the results
    df = pd.DataFrame(email_metadata)
    print("Most common email usernames -----------")
    print(df.groupby('username')
            .size().reset_index(name='count')
            .sort_values(by='count',ascending=False)
            .to_string(index=False))
    print()
    print("Most common email domains -------------")
    print(df.groupby('domain')
            .size().reset_index(name='count')
            .sort_values(by='count',ascending=False)
            .to_string(index=False))
Enter fullscreen mode Exit fullscreen mode

See the full complete script on Github.

Running

From the project directory:

python3 gmail_organizer.py
Enter fullscreen mode Exit fullscreen mode

A new browser window will open prompting you to sign in to your Google account. The script will analyze the emails in the Gmail account associated with the Google account you sign in with at this point. The browser window will warn you that this is unsafe, but that is only because your application is unverified. If necessary, you can go through the process to verify your application.

After running the application, you should get an output similar to the following:

Most common email usernames -----------
       username  count
           info      6
        noreply      5
       no-reply      2
     donotreply      1
...
Most common email domains -------------
         domain  count
    example.com      5
    youtube.com      2
     change.org      1
...
Enter fullscreen mode Exit fullscreen mode

Extending the script

The example above is a very simple example of what you can accomplish. It serves as a scaffold that you can expand to tackle more complex situations. It is possible to extend the script to modify your inbox, including labeling or deleting emails.

Start by making sure you have the correct SCOPES for the operations you are attempting. Google outlines the different scopes here.

To be able to additionally label emails, we need the modify scope. This means we need to update:

# If modifying these scopes, delete the file token.json.
SCOPES = ['https://www.googleapis.com/auth/gmail.readonly']
Enter fullscreen mode Exit fullscreen mode

to:

# If modifying these scopes, delete the file token.json.
SCOPES = ['https://www.googleapis.com/auth/gmail.readonly',
          'https://www.googleapis.com/auth/gmail.modify']
Enter fullscreen mode Exit fullscreen mode

Make sure you delete token.json after changing the scopes.

From here, labeling/starring an email is very straightforward.

def label_emails(service, message_id):
    response = service.users().messages().modify(
        userId='me',
        id=message_id,
        body={
            "addLabelIds":['STARRED']
        }
    ).execute()
Enter fullscreen mode Exit fullscreen mode

Note: For labeling large numbers of emails, consider using batchModify instead (for the same reasons we did for retrieving metadata earlier).

We've already done the work to process this data in process_email_metadata(..), so to star all emails from example.com all we need to do is add the following lines to main() below get_inbox_emails(service):

    for email in email_metadata:
        if(email['domain'] == 'example.com'):
            label_emails(service, email['message_id'])
Enter fullscreen mode Exit fullscreen mode

Top comments (1)

Collapse
 
ashusharmatech profile image
Ashutosh Sharma

Idea is good. We can do it without sending the data to some other application. Will try.