loading...
Cover image for Practical Puppeteer: How to evaluate XPath expression

Practical Puppeteer: How to evaluate XPath expression

sonyarianto profile image Sony AK ・3 min read

Today I will share about how to evaluate XPath expression in Puppeteer using $x API and in addition we will also use waitForXPath API.

Before I learn Puppeteer, I mostly use XPath on PHP through their DOMXPath class and I found it very useful for doing element selector things. I feel comfortable and easy when using XPath expression rather than using CSS selector, it's just my personal opinion, sorry :)

For those who don't know XPath, here is according to Wikipedia

XPath (XML Path Language) is a query language for selecting nodes from an XML document. In addition, XPath may be used to compute values (e.g., strings, numbers, or Boolean values) from the content of an XML document. XPath was defined by the World Wide Web Consortium (W3C).

In Puppeteer there are two API that related to XPath. One is waitForXPath that same like waitForSelector. The purpose is the same, it wait for element to appear based on our XPath expression. The second is $x method that useful for evaluating XPath expression. The $x will return array of ElementHandle and I will show you the sample later.

Stop the boring things. Let's start with a scenario. I have a website it's called Lamudi in Indonesia https://www.lamudi.co.id/newdevelopments/ and I want to get/scrape the value based on selector show below.

Alt Text

Our target is this selector. I want to get the 160 value.

<span class="CountTitle-number">160</span>

Usually we can use CSS selector like document.querySelector('span[class="CountTitle-number"]') but alternatively now we are using XPath expression like this //span[@class="CountTitle-number"].

On Developer tools console we can get this selector easily. Try type this on Developer tools on your browser.

$x('//span[@class="CountTitle-number"]');  

The image result is like below.

Alt Text

OK nice, now we already get the ElementHandle from that XPath expression. OK now let's create the script on that use Puppeteer to get this selector text content.

Preparation

npm i puppeteer

The code

The code is self explanatory and I hope you can adjust, expand or improvise for your specific needs later.

File puppeteer_xpath.js

const puppeteer = require('puppeteer');

(async () => {
    // set some options (set headless to false so we can see 
    // this automated browsing experience)
    let launchOptions = { headless: false, args: ['--start-maximized'] };

    const browser = await puppeteer.launch(launchOptions);
    const page = await browser.newPage();

    // set viewport and user agent (just in case for nice viewing)
    await page.setViewport({width: 1366, height: 768});
    await page.setUserAgent('Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36');

    // go to the target web
    await page.goto('https://www.lamudi.co.id/newdevelopments/');

    // wait for element defined by XPath appear in page
    await page.waitForXPath("(//span[@class='CountTitle-number'])[1]");

    // evaluate XPath expression of the target selector (it return array of ElementHandle)
    let elHandle = await page.$x("(//span[@class='CountTitle-number'])[1]");

    // prepare to get the textContent of the selector above (use page.evaluate)
    let lamudiNewPropertyCount = await page.evaluate(el => el.textContent, elHandle[0]);

    console.log('Total Property Number is:', lamudiNewPropertyCount);

    // close the browser
    await browser.close();
})();

Run it

node puppeteer_xpath.js

If everything OK it will display the result like below.

Total Property Number is: 160

Conclusion

I think Puppeteer support for XPath will be very useful for data scraping, since sometimes it's hard to write CSS selector for specific use case.

Thank you and I hope you enjoy it. See you again on next Practical Puppeteer series.

Source code of this sample is available on GitHub https://github.com/sonyarianto/xpath-on-puppeteer.git

Reference

Posted on by:

Discussion

pic
Editor guide