DEV Community

Cover image for Extracting Links from Gmail Emails Using Node.js,Imap and Puppeteer
QAProEngineer
QAProEngineer

Posted on

Extracting Links from Gmail Emails Using Node.js,Imap and Puppeteer

Introduction:

This article will walk you through the process of extracting links from Gmail emails using Node.js and Puppeteer. We'll explore how to set up an IMAP client to access your Gmail inbox, parse email content using the 'mailparser' library, and extract links from the email body using the 'cheerio' library. Additionally, we'll use Puppeteer to interact with and navigate to the extracted links within the emails.
Prerequisites:
Before we get started, make sure you have the following prerequisites in place:

  1. A Gmail account: You will need a Gmail account from which you want to extract links.
  2. App Password: To access your Gmail account programmatically, you should generate an App Password. This password is used in place of your regular Gmail password and is more secure for applications. We'll explain how to create an App Password in the article.
  3. Node.js: Ensure you have Node.js installed on your computer. Getting Started: Creating an App Password

Gmail has a security feature that prevents the use of your regular password for less secure apps. To work around this, you can create an App Password. Here's how to do it:

  1. Sign in to your Gmail account.
  2. Go to your Google Account settings: - Click on your profile picture in the upper-right corner and select "Google Account." - In the left sidebar, click on "Security."
  3. Under "Signing in to Google," click on "App passwords."
  4. You may need to sign in again.
  5. In the "App passwords" section, click "Select app" and choose "Mail" (for email) and "Other (Custom name)" for the device. Enter a custom name (e.g., "Node.js Email Extractor").
  6. Click "Generate." You'll receive a 16-character app password. Make sure to copy it somewhere safe; you won't be able to see it again.

Setting Up the Code:
Now that you have an App Password, you can use it to access your Gmail account via IMAP and extract links from emails using Node.js. Here's a breakdown of the code you provided:

  • emailConfig: This object contains your Gmail account information, including the App Password you generated.

  • magicLinkSubject: This is the subject of the email you want to search for.

  • The code sets up an IMAP client, searches for emails with the specified subject, and then extracts and logs the email subject and body.

  • It uses 'cheerio' to parse the HTML content of the email and extract links within anchor tags. The links are stored in the links array and logged to the console.

  • Puppeteer is used to launch a browser, navigate to the first extracted link, and log the link.
    Running the Code:

  1. Make sure you have Node.js installed on your computer.
  2. Install the required Node.js packages:
npm install imap mailparser cheerio puppeteer

Enter fullscreen mode Exit fullscreen mode
  1. Replace the emailConfig object's user and password properties with your Gmail address and the App Password you generated.

  2. Run the script:

node your-script-filename.js

Enter fullscreen mode Exit fullscreen mode

See below the full script:

const cheerio = require('cheerio');
const Imap = require('imap');
const { default: puppeteer} = require('puppeteer');
const simpleParser = require('mailparser').simpleParser;

const emailConfig = {
    user: 'test-email@gmail.com', // Replace with your Gmail email address
    password: 'generated-app-password', // Replace with your Gmail password
    host: 'imap.gmail.com',
    port: 993,
    tls: true,
    tlsOptions: {
        rejectUnauthorized: false, // For testing, you can disable certificate rejection
    },
    connectTimeout: 100000, // 60 seconds 
    authTimeout: 30000,
    debug: console.log,
};

const magicLinkSubject = 'Email subject example';

(async () => {
    // Set up an IMAP client
    const imap = new Imap(emailConfig);

    imap.once('ready', () => {
        imap.openBox('INBOX', true, (err) => {
            if (err) {
                console.error('Error opening mailbox', err);
                imap.end();
                return;
            }

            // Search for emails with the magic link subject
            imap.search([['SUBJECT', magicLinkSubject]], (err, results) => {
                if (err) throw err;

                const emailId = results[0]; // Assuming the first result is the correct email
                console.log('This is the email address: ' + emailId);
                const email = imap.fetch(emailId, { bodies: '' });

                email.on('message', (msg, seqno) => {
                    msg.on('body', (stream) => {
                        simpleParser(stream, async (err, mail) => {
                            if (err) throw err;

                            // Extract and log the email subject
                            const emailSubject = mail.text;
                            console.log('Email Subject:', emailSubject);
                            // Your code to extract and process the email content here
                            // Extract and log the email body
                            const emailBody = mail.html;
                            console.log('Email Body:', emailBody);
                            //Use cheerio to extract links 
                            const $ =cheerio.load(emailBody);
                            const links=[];
                            $('a').each((index,element)=>{
                                links.push($(element).attr('href'));

                            });
                            console.log('Extracted Links', links);
                            const browser =await puppeteer.launch({headless: false});
                            const page = await browser.newPage();
                            await page.goto(links[0]);
                            console.log('this is the first link'+ links[0]);

                        });
                    });
                });
            });
        });
    });

    imap.connect();

    // Handle errors and edge cases as needed
})();
Enter fullscreen mode Exit fullscreen mode

Top comments (0)