DEV Community

Mohan Ganesan
Mohan Ganesan

Posted on • Originally published at proxiesapi.com

How To Handle Infinite Page Scrolls While Web Scraping

Websites with infinite page scrolls are rendered using AJAX. It calls back to the server for extra content as the user pages down the page.

One of the ways of scraping data like this is to simulate the browser, allow the javascript to fire the ajax, and also to simulate a page scroll.

Puppeteer is the best tool to do that. It controls the Chromium browser behind the scenes.

Let’s take a Quora answers pages as an example of an infinite scroll page. In this example, we will try to load the page and scroll down till we reach the end of the content and then take a screenshot of the page to our local disk.

Let’s install the puppeteer first.

mkdir quora_scraper
cd quora_scraper
npm install --save puppeteer

Then create a file like this and save it in the quora_scraper folder. Call it quora_scroll.js

const fs = require('fs');
const puppeteer = require('puppeteer');

(async () => {
const browser = await puppeteer.launch({
headless: false
});
const page = await browser.newPage();
await page.goto('https://www.quora.com/Which-one-is-the-best-data-scraping-services');
await page.setViewport({
width: 1200,
height: 800
});

await autoScroll(page);// keep scrolling till resolution


await page.screenshot({
    path: 'quora.png',
    fullPage: true
});

await browser.close();
Enter fullscreen mode Exit fullscreen mode

})();

async function autoScroll(page){
await page.evaluate(async () => {
await new Promise((resolve, reject) => {
var totalHeight = 0;
var distance = 100;
var timer = setInterval(() => {
var scrollHeight = document.body.scrollHeight;
window.scrollBy(0, distance);
totalHeight = distance;

                            //a few of the last scrolling attempts have brought no new 
                            //data so the distance we tried to scroll is now greater 
                            //than the actual page height itself

            if(totalHeight >= scrollHeight){
                clearInterval(timer);//reset 
                resolve();
            }
        }, 100);
    });
});
Enter fullscreen mode Exit fullscreen mode

}

Now run it by the command.

node quora_scroll.js

It should open the Chromium browser, and you should be able to see the page scroll in action.

Once done, you will find a rather large file called quora.png in your folder.

For further reading read the article How To Scrape Quora Using Puppeteer.

The author is the founder of Proxies API the rotating proxies service.

Top comments (0)