DEV Community

Cover image for Practical Puppeteer: Using proxy to browse a page
Sony AK
Sony AK

Posted on • Updated on

Practical Puppeteer: Using proxy to browse a page

Today Puppeteer topic will be related to proxy. Using proxy when browse a page is useful when we want to hide our origin access location. That's only one reason. Another reason it can be used to protect our privacy and other use case is to open a website with geographical restriction.

According to Wikipedia,

In computer networks, a proxy server is a server (a computer system or an application) that acts as an intermediary for requests from clients seeking resources from other servers. A client connects to the proxy server, requesting some service, such as a file, connection, web page, or other resource available from a different server and the proxy server evaluates the request as a way to simplify and control its complexity. Proxies were invented to add structure and encapsulation to distributed systems.

In Puppetter we can use a proxy when we browse a page on internet. I will use several sample of proxy, such as SOCKS4, SOCKS5 and HTTP proxy.

Let's start.

Preparation

Install Puppeteer

npm i puppeteer
Enter fullscreen mode Exit fullscreen mode

We also need some proxy sample. For this I will use list of free proxy from https://hidemy.name/en/proxy-list/ and we can pick several proxy from there.

The code

We will use SOCKS4 proxy and IP location of this proxy at Cambodia. Proxy IP address 96.9.77.192 and port 55796. I hope the proxy address still working when you try the example.

File proxy_with_puppeteer.js

const puppeteer = require('puppeteer');

(async () => {
    // set some options (set headless to false so we can see 
    // this automated browsing experience)
    let launchOptions = { headless: false, 
                          args: ['--start-maximized',
                                 '--proxy-server=socks4://96.9.77.192:55796'] // this is where we set the proxy
                        };

    const browser = await puppeteer.launch(launchOptions);
    const page = await browser.newPage();

    // set viewport and user agent (just in case for nice viewing)
    await page.setViewport({width: 1366, height: 768});
    await page.setUserAgent('Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36');

    // go to whatismycountry.com to see if proxy works (based on geography location)
    await page.goto('https://whatismycountry.com');

    // close the browser
    // await browser.close();
})();
Enter fullscreen mode Exit fullscreen mode

Run it with

node proxy_with_puppeteer.js
Enter fullscreen mode Exit fullscreen mode

It will open website https://whatismycountry.com and it will show like below.

Alt Text

Ow nice, it means the proxy works.

How about SOCKS5 proxy? It's easy, just change the code that set the proxy like below.

'--proxy-server=socks5://PROXY_IP_ADDRESS:PROXY_PORT'
Enter fullscreen mode Exit fullscreen mode

For HTTP or HTTPS proxy we can do like below.

'--proxy-server=PROXY_IP_ADDRESS:PROXY_PORT'
Enter fullscreen mode Exit fullscreen mode

If the proxy need authentication, we can add this code to support authentication. Put it before page.goto() part.

    // set the proxy credential
    await page.authenticate({'username': 'YOUR_USERNAME', 'password': 'YOUR_PASSWORD'});
Enter fullscreen mode Exit fullscreen mode

That's it.

We can get many high quality proxy by using affordable proxy service, one of them such as https://smartproxy.com, http://stormproxies.com or https://luminati.io and many more. The choice is yours.

Thank you and I hope you enjoy it.

Reference

Top comments (14)

Collapse
 
princepeterhansen profile image
Peter Hansen

Hi Sony,

Cool Article!

I have also been looking for a proxy solution when using Puppeteer. One of the easiest solutions I found is using API proxy services. You don't have to worry about finding and setting up proxies.

The basic example will look like this:

import puppeteer from 'puppeteer';

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  const url = 'https://proxybot.io/api/......?url=https://example.com';

  await page.goto(url);
})();
Enter fullscreen mode Exit fullscreen mode

What do you think?

Collapse
 
sonyarianto profile image
Sony AK

Hi Peter,
Thanks, this is also good way and reduce headache for setting up a proxy :) Thanks for the addition.

Collapse
 
gajus profile image
Gajus Kuizinas

You can use github.com/gajus/puppeteer-proxy to set proxy either for entire page or for specific requests only, e.g.

import puppeteer from 'puppeteer';
import {
  createPageProxy,
} from 'puppeteer-proxy';

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  const pageProxy = createPageProxy({
    page,
    proxyUrl: 'http://127.0.0.1:3000',
  });

  await page.setRequestInterception(true);

  page.once('request', async (request) => {
    await pageProxy.proxyRequest(request);
  });

  await page.goto('https://example.com');
})();

To skip proxy simply call request.continue() conditionally.

Using puppeteer-proxy Page can have multiple proxies.

Collapse
 
sonyarianto profile image
Sony AK

Great addition :) Thank you very much.

But hey, today I learn Playwright, similar like Puppeteer even from the same team, what do you think about that?

Collapse
 
gajus profile image
Gajus Kuizinas

Playwright is a new project. I would not consider it for production at this time. The only (?) advantage is support for different browser engines. I only see a limited use case for that.

Thread Thread
 
sonyarianto profile image
Sony AK

ic ic, but the advantage is the developer is the same like Puppeteer, move out from Google to Microsoft :) So I think they will learn from Puppeteer and improve a lot.

Collapse
 
cuadrix profile image
Cuadrix

I created a module for that. It's very simple to use,
First install it:

npm i puppeteer-page-proxy

Then require it:

const useProxy = require('puppeteer-page-proxy');

And then simply use it. To set a proxy for an entire page, do this:

await useProxy(page, 'http://127.0.0.1:771');

Or if you want to set it per requests, just do this:

await page.setRequestInterception(true);
page.on('request', req => {
    useProxy(req, 'socks5://127.0.0.1:9000');
});

Repository: github.com/Cuadrix/puppeteer-page-...

Collapse
 
sonyarianto profile image
Sony AK

Hi Cuadrix, thanks for the addition. This is cool and more natural for human :)

Collapse
 
assender profile image
assender

Great article! I would also add that proxies are very important when you have to bypass various restrictions and access the content you want/need or when you're working with web scraping and similar services.
But it's always tricky to choose the right proxies for yourself, so I've made a short review of residential proxy providers for anyone in need.

Collapse
 
theincognitotech profile image
theincognitotech

Solid article, you have <3 from me! The thing is that between your references I see free proxy sites and that's not a thing you can trust, believe me. There are many decent proxy providers you can trust for affordable prices, one of my favorite is this one, I even made a review and recommend it.

Collapse
 
sonyarianto profile image
Sony AK

Hi @theincognitotech thanks for the comment, I agree with you, my list maybe not trusted but that's for quick test purpose :) BTW I will add your list here :) I think that's good one :)

Collapse
 
michaelswerston profile image
MichaelSwerston

It's crazy how many use cases proxy technology has, I wrote an article on some of the use cases of residential proxies for businesses specifically.

Collapse
 
lunalopezz profile image
LunaLopezz • Edited

Great article and simple explanation! I must agree that free proxy services, in this case, might not be an option, but since we have a wide variety of secure and high-quality paid services, such as Smartproxy, Netnut, Microleaves or other it's always a good idea to invest into such services and to stay safe while being/working online.

Collapse
 
sonyarianto profile image
Sony AK

yes correct, totally agree on this and free proxy is just for temporary solution or just for proof-of-concept, for serious task we must get commercial proxy services.