loading...
Cover image for Practical Puppeteer: Using proxy to browse a page

Practical Puppeteer: Using proxy to browse a page

sonyarianto profile image Sony AK Updated on ・2 min read

Today Puppeteer topic will be related to proxy. Using proxy when browse a page is useful when we want to hide our origin access location. That's only one reason. Another reason it can be used to protect our privacy and other use case is to open a website with geographical restriction.

According to Wikipedia,

In computer networks, a proxy server is a server (a computer system or an application) that acts as an intermediary for requests from clients seeking resources from other servers. A client connects to the proxy server, requesting some service, such as a file, connection, web page, or other resource available from a different server and the proxy server evaluates the request as a way to simplify and control its complexity. Proxies were invented to add structure and encapsulation to distributed systems.

In Puppetter we can use a proxy when we browse a page on internet. I will use several sample of proxy, such as SOCKS4, SOCKS5 and HTTP proxy.

Let's start.

Preparation

Install Puppeteer

npm i puppeteer

We also need some proxy sample. For this I will use list of free proxy from https://hidemy.name/en/proxy-list/ and we can pick several proxy from there.

The code

We will use SOCKS4 proxy and IP location of this proxy at Cambodia. Proxy IP address 96.9.77.192 and port 55796. I hope the proxy address still working when you try the example.

File proxy_with_puppeteer.js

const puppeteer = require('puppeteer');

(async () => {
    // set some options (set headless to false so we can see 
    // this automated browsing experience)
    let launchOptions = { headless: false, 
                          args: ['--start-maximized',
                                 '--proxy-server=socks4://96.9.77.192:55796'] // this is where we set the proxy
                        };

    const browser = await puppeteer.launch(launchOptions);
    const page = await browser.newPage();

    // set viewport and user agent (just in case for nice viewing)
    await page.setViewport({width: 1366, height: 768});
    await page.setUserAgent('Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36');

    // go to whatismycountry.com to see if proxy works (based on geography location)
    await page.goto('https://whatismycountry.com');

    // close the browser
    // await browser.close();
})();

Run it with

node proxy_with_puppeteer.js

It will open website https://whatismycountry.com and it will show like below.

Alt Text

Ow nice, it means the proxy works.

How about SOCKS5 proxy? It's easy, just change the code that set the proxy like below.

'--proxy-server=socks5://PROXY_IP_ADDRESS:PROXY_PORT'

For HTTP or HTTPS proxy we can do like below.

'--proxy-server=PROXY_IP_ADDRESS:PROXY_PORT'

If the proxy need authentication, we can add this code to support authentication. Put it before page.goto() part.

    // set the proxy credential
    await page.authenticate({'username': 'YOUR_USERNAME', 'password': 'YOUR_PASSWORD'});

That's it.

We can get many high quality proxy by using affordable proxy service, one of them such as https://smartproxy.com, http://stormproxies.com or https://luminati.io and many more. The choice is yours.

Thank you and I hope you enjoy it.

Reference

Discussion

pic
Editor guide
Collapse
princepeterhansen profile image
Peter Hansen

Hi Sony,

Cool Article!

I have also been looking for a proxy solution when using Puppeteer. One of the easiest solutions I found is using API proxy services. You don't have to worry about finding and setting up proxies.

The basic example will look like this:

import puppeteer from 'puppeteer';

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  const url = 'https://proxybot.io/api/......?url=https://example.com';

  await page.goto(url);
})();
Enter fullscreen mode Exit fullscreen mode

What do you think?

Collapse
sonyarianto profile image
Sony AK Author

Hi Peter,
Thanks, this is also good way and reduce headache for setting up a proxy :) Thanks for the addition.

Collapse
gajus profile image
Gajus Kuizinas

You can use github.com/gajus/puppeteer-proxy to set proxy either for entire page or for specific requests only, e.g.

import puppeteer from 'puppeteer';
import {
  createPageProxy,
} from 'puppeteer-proxy';

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  const pageProxy = createPageProxy({
    page,
    proxyUrl: 'http://127.0.0.1:3000',
  });

  await page.setRequestInterception(true);

  page.once('request', async (request) => {
    await pageProxy.proxyRequest(request);
  });

  await page.goto('https://example.com');
})();

To skip proxy simply call request.continue() conditionally.

Using puppeteer-proxy Page can have multiple proxies.

Collapse
sonyarianto profile image
Sony AK Author

Great addition :) Thank you very much.

But hey, today I learn Playwright, similar like Puppeteer even from the same team, what do you think about that?

Collapse
gajus profile image
Gajus Kuizinas

Playwright is a new project. I would not consider it for production at this time. The only (?) advantage is support for different browser engines. I only see a limited use case for that.

Thread Thread
sonyarianto profile image
Sony AK Author

ic ic, but the advantage is the developer is the same like Puppeteer, move out from Google to Microsoft :) So I think they will learn from Puppeteer and improve a lot.

Collapse
cuadrix profile image
Cuadrix

I created a module for that. It's very simple to use,
First install it:

npm i puppeteer-page-proxy

Then require it:

const useProxy = require('puppeteer-page-proxy');

And then simply use it. To set a proxy for an entire page, do this:

await useProxy(page, 'http://127.0.0.1:771');

Or if you want to set it per requests, just do this:

await page.setRequestInterception(true);
page.on('request', req => {
    useProxy(req, 'socks5://127.0.0.1:9000');
});

Repository: github.com/Cuadrix/puppeteer-page-...

Collapse
sonyarianto profile image
Sony AK Author

Hi Cuadrix, thanks for the addition. This is cool and more natural for human :)

Collapse
assender profile image
assender

Great article! I would also add that proxies are very important when you have to bypass various restrictions and access the content you want/need or when you're working with web scraping and similar services.
But it's always tricky to choose the right proxies for yourself, so I've made a short review of residential proxy providers for anyone in need.

Collapse
theincognitotech profile image
theincognitotech

Solid article, you have <3 from me! The thing is that between your references I see free proxy sites and that's not a thing you can trust, believe me. There are many decent proxy providers you can trust for affordable prices, one of my favorite is this one, I even made a review and recommend it.

Collapse
sonyarianto profile image
Sony AK Author

Hi @theincognitotech thanks for the comment, I agree with you, my list maybe not trusted but that's for quick test purpose :) BTW I will add your list here :) I think that's good one :)

Collapse
michaelswerston profile image
MichaelSwerston

It's crazy how many use cases proxy technology has, I wrote an article on some of the use cases of residential proxies for businesses specifically.

Collapse
lunalopezz profile image
LunaLopezz

Great article and simple explanation! I must agree that free proxy services, in this case, might not be an option, but since we have a wide variety of secure and high-quality paid services, such as Smartproxy, Netnut, Microleaves or other it's always a good idea to invest into such services and to stay safe while being/working online.

Collapse
sonyarianto profile image
Sony AK Author

yes correct, totally agree on this and free proxy is just for temporary solution or just for proof-of-concept, for serious task we must get commercial proxy services.