loading...
Cover image for User Agent string difference in Puppeteer headless and headful

Puppeteer User Agent User Agent string difference in Puppeteer headless and headful

sonyarianto profile image Sony AK ・3 min read

Today I will talk about the User Agent difference when we running Puppeteer in headless and headful mode.

For people not familiar with Puppeteer, Puppeteer is a Node library that provides many high-level API to control the headless Chrome or Chromium over DevTools protocol. You can go to https://pptr.dev/ for more details.

Puppeteer in headless mode means you control Chrome or Chromium browser without displaying the browser UI. In the opposite, Puppeteer in headful mode will display the browser UI and this is useful for debugging.

As mentioned here https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent, User Agent string is a characteristic string that allows the network protocol peers to identify the application type, operating system, software vendor or software version of the requesting software user agent.

Web browser send User-Agent request header when we browse a web pages on the internet. Here is sample of my User Agent.

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36

Preparation

Install Puppeteer with this command.

npm i puppeteer

The code

OK now let's create a code to show User Agent string when running Puppeteer in headless mode.

File puppeteer_headless.js

const puppeteer = require('puppeteer');

(async () => {
        const browser = await puppeteer.launch();

        console.log(await browser.userAgent());

        await browser.close();
})();

Run it.

node puppeteer_headless.js

On my machine it will display like below.

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/79.0.3945.0 Safari/537.36

Please notice there is sub string HeadlessChrome there.

OK now let's create a code to show User Agent string when running Puppeteer in headful mode.

File puppeteer_headful.js

const puppeteer = require('puppeteer');

(async () => {
        const browser = await puppeteer.launch({ headless: false });

        console.log(await browser.userAgent());

        await browser.close();
})();

Run with

node puppeteer_headful.js

On my machine it will display like below.

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.0 Safari/537.36

Now we can see that this User Agent string is similar like normal web browser User Agent string.

Why this is interesting? Suppose you want to scrap a website using Puppeteer in headless mode and the target website put a protection by detecting the User Agent string (blocking ChromeHeadless) then your scraping activity might be blocked.

How to set User Agent on headless Chrome

Anyway we still can set User Agent string in Puppeteer headless mode, it will override the default headless Chrome User Agent string.

Here is the code sample.

File puppeteer_set_user_agent.js

const puppeteer = require('puppeteer');

(async () => {
        // prepare for headless chrome
        const browser = await puppeteer.launch();
        const page = await browser.newPage();

        // set user agent (override the default headless User Agent)
        await page.setUserAgent('Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36');

        // go to Google home page
        await page.goto('https://google.com');

        // get the User Agent on the context of Puppeteer
        const userAgent = await page.evaluate(() => navigator.userAgent );

        // If everything correct then no 'HeadlessChrome' sub string on userAgent
        console.log(userAgent);

        await browser.close();
})();

It will display User Agent that we already set before we browse to Google web page.

Thank you and I hope you enjoy it.

Posted on Nov 29 '19 by:

Discussion

markdown guide
 

Hi I wanted to know how to change the cdc variable to go undetected from the message of "chrome is controlled by an automation software". No idea if the site detected...

 

Hi Rudra, thanks for the question. Actually I still have no idea about it as well. But any use case for you to hide that thing?

I found this link help.applitools.com/hc/en-us/artic... that maybe related to it?

 
 

There must be other altered behaviours too. Some tests were not working in headless mode, after developping them with browser display.

 

ic ic, thanks for the info