DEV Community

loading...
Cover image for Master of Puppets: Using Headless Chrome

Master of Puppets: Using Headless Chrome

MichaelPaulKunz
Computer science student in New Orleans, LA
Updated on ・4 min read

Imagine browsing the web without a graphical interface. This is Chrome in headless mode, without the point-and-click windows we're all used to. Accessing a page in headless mode is more efficient because your browser doesn't have to process layout, images, video, etc. Headless Chrome is useful for front-end testing. It also lets search engines and other web crawlers access the full DOM without rendering the full page. Sometimes hackers use headless mode to bypass XSS restrictions and inject malware.

In the Terminal

You can run headless mode from the command line. Append the --headless tag to a $google-chrome command in your bash terminal.

google-chrome --headless
Enter fullscreen mode Exit fullscreen mode

You'll notice nothing happens. Without Chrome's user interface, we have nothing but a terminal to type commands in. The dump-dom command will display a full text-rendering of the DOM for any URL you enter after it.

google-chrome --headless --dump-dom https://example.com

Enter fullscreen mode Exit fullscreen mode

Try it yourself. Even a simple page like example.com has a pretty lengthy DOM, so I included the text at this link to avoid bulking up the article. If your terminal is displaying similar text to what's in the link, you've successfully accessed example.com in headless mode.

In VS Code with Puppeteer

You aren't limited to the terminal window when browsing in headless mode. There are APIs that let you access it in your JavaScript code. This article focuses on Puppeteer, a node library with an API that allows you to perform most browser actions in your code. You'll need some version of Node to run Puppeteer. I'm using Node v14.15.4. To install Puppeteer in your project, enter npm i puppeteer into the terminal. It should add a node_modules folder and a package-lock.json folder to your parent directory. Puppeteer bundles all its necessary dependencies, so your json file will be about 400 lines long, and you won't need to worry about running any other terminal commands for it to work.

npm i puppeteer
Enter fullscreen mode Exit fullscreen mode

Create a JavaScript file and start writing your Puppeteer code. You can create a screenshot of any website by entering the URL. First use node's require command to assign Puppeteer to a variable. Then use an asynchronous function to launch Puppeteer, open a headless browser, navigate to the desired site, and take a screenshot. Finally, close the browser. In this example, we take a screenshot of the Google Developer's page for Puppeteer:

const ventriloquist = require('puppeteer');

(async() => {
  const startUp = await ventriloquist.launch();
  const virtualBrowser = await startUp.newPage({headless: true});
  await virtualBrowser.goto('https://developers.google.com/web/tools/puppeteer');
  await virtualBrowser.screenshot({path: 'puppetmaster.png'});

  await virtualBrowser.close();
})();
Enter fullscreen mode Exit fullscreen mode

You can enter the above code into a JavaScript file -- puppet.js -- and run the file with node puppet.js. After running, you'll have a new file in your parent directory called puppetmaster.png. It will look like this (until Google changes their developer page layout or the contents of their Puppeteer page).
Alt Text

Testing

Developers use Puppeteer to test the front end of their design and to do end-to-end testing. Headless mode allows us all the functionality of our browser without the costly layout rendering, so it's ideal for setting up efficient tests. Puppeteer lets us test our front-end server-side instead of client-side which is four times faster. Going into detail about testing with Puppeteer is beyond the scope of this article, but Akshay Kadam wrote a tutorial for Sitepoint about end-to-end testing with Puppeteer and Yarn.
Alt Text

Web Indexing

Web pages these days are written mostly in JavaScript or JSX, with the HTML page serving as a blank canvas for Angular or React to add content. This presents a problem for web indexing. If a search engine wants to collect data about a site, it can no longer just read its HTML page. Crawling websites with a headless browser is a good way to get all the relevant DOM information, not just what's in the HTML file. Eric Bidelman goes into more detail here.
Alt Text

Malicious Use

The same features that make Puppeteer so useful for web indexing make it a potential tool for hackers. You can bypass XSS restrictions by directly accessing a site in headless mode. While this doesn't necessarily invite scripting attacks, it does allow for easier creation of web crawlers that can mass scan sites for vulnerabilities. It is not common practice for servers to block Headless Chrome. For the legitimate developer, that means you can use its features without fear of 404 errors. Read more from David Bekerman at Imperva.

Summary

  • Headless Chrome is Chrome minus the window
  • You can access it from the terminal or in VS Code with APIs
  • Puppeteer is a node library with a great headless API
  • You can use Puppeteer for testing and web indexing
  • Some people who use Puppeteer are up to no good, but they haven't yet ruined it for the rest of us
  • Works Cited

Discussion (0)

Forem Open with the Forem app