DEV Community

Cover image for How to Bypass Captcha Automatic Login with Nodejs Playwright 2Captcha
openHacking
openHacking

Posted on

How to Bypass Captcha Automatic Login with Nodejs Playwright 2Captcha

Original: https://lwebapp.com/en/post/regular-expression-to-match-multiple-lines-of-text

Question

In our daily work, in order to improve work efficiency, we may write scripts to automate tasks. Because some websites require users to log in, the automatic login function of the script is essential.

However, when we log in to the website, we often see verification codes. The purpose of verification codes is to prevent machine logins and automate script operations. Is there a way for scripts to automatically identify verification codes to achieve login?

Next, I will use bilibili.com as an example to explain to you how to solve the most critical verification code problem in the automatic login script.

Explore

First of all, you need to experience the login method of this website and understand its verification code type.

Open https://www.bilibili.com/, open the console, click login, then a small login box will pop up in the middle, usually after entering the account and password, the verification code box will pop up, we guess the verification code interface has been requested at this time.

Since the English of the verification code is captcha, we search for captcha in the network panel

search captcha

An interface related to verification code was found

https://passport.bilibili.com/x/passport-login/captcha
Enter fullscreen mode Exit fullscreen mode

Click on the interface to see the results, and there is some useful information, we found that the captcha type is geetest.

{
  "code": 0,
  "message": "0",
  "ttl": 1,
  "data": {
    "type": "geetest",
    "token": "b416c387953540608bb5da384b4e372b",
    "geetest": {
      "challenge": "aeb4653fb336f5dcd63baecb0d51a1f3",
      "gt": "ac597a4506fee079629df5d8b66dd4fe"
    },
    "tencent": {
      "appid": ""
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Through searching, I found that the verification code service used by bilibili.com is provided by geetest, which is used by many websites. The feature of geetest verification code is to move puzzles and select words or numbers in order.

So next, let's find a way to recognize the geetest verification code.

I learned about the verification code solutions provided on the market, and the most effective ones are basically OCR service providers. After comparison, I found that the service of 2Captcha is very good, with fast decoding speed, stable server connection, multi-language API support, and reasonable price, I decided to try 2Captcha.

2Captcha official website

Next, we will show the use of Nodejs + Playwright + 2Captcha to solve the login verification code problem at bilibili.com.

If you want to use other languages ​​and frameworks, such as Python + Selenium, you can also refer to this tutorial, the idea of ​​​​solving the problem is the same.

Solution

  1. How to identify the verification code

First read the official document 2Captcha API Geetest, the solution is very detailed, simply put

  • By intercepting the website interface, get the two verification code parameters gt and challenge, request http://2captcha.com/in.php, and get the verification code ID
  • Request http://2captcha.com/res.php after a period of time, and get the challenge, validate, seccode of successful verification
  1. How to apply verification results

After getting the most critical validate, simulate the user to fill in the account and password to log in, intercept the return parameters of the verification code request interface, replace them with the parameters of successful verification, and then trigger the login interface.

Next, we analyze the detailed steps

Environment

Let's build the script execution environment first.

We use Node.js + Playwright for scripting.

  1. Make sure that Nodejs has been installed locally on your computer

  2. Create a new empty project and install Playwright

mkdir bypass-captcha
cd bypass-captcha
npm init
npm i -D playwright
Enter fullscreen mode Exit fullscreen mode

We adopt Playwright's library mode, detailed documentation: Playwright

  1. Create a new script file captcha.js in the project root directory, fill in the following content, run node captcha.js on the command line to simply test whether the project can be started normally
const { chromium } = require("playwright");

(async () => {
  const browser = await chromium.launch({
    headless: false,
  });
  const page = await browser.newPage();
  await page.goto("https://www.bilibili.com/");

  await browser.close();
})();
Enter fullscreen mode Exit fullscreen mode

Under normal circumstances, a Google browser interface will pop up, displaying the home page of bilibili.com, and then the browser will automatically close.

Request in.php interface

  1. First, sort out the parameters required to request the http://2captcha.com/in.php interface. You can see the list of parameters. We will pay attention to the parameters that must be passed.
Parameter Type Required Description
key String Yes your API key
method String Yes geetest - defines that you're sending a Geetest captcha
gt String Yes Value of gt parameter you found on target website
challenge String Yes Value of challenge parameter you found on target website
api_server String No Value of api_server parameter you found on target website
pageurl String Yes Full URL of the page where you see Geetest captcha
header_acao IntegerDefault: 0 No 0 - disabled1 - enabled.If enabled in.php will include Access-Control-Allow-Origin:* header in the response. Used for cross-domain AJAX requests in web applications. Also supported by res.php.
pingback String No URL for pingback (callback) response that will be sent when captcha is solved.URL should be registered on the server. More info here.
json IntegerDefault: 0 No 0 - server will send the response as plain text1 - tells the server to send the response as JSON
soft_id Integer No ID of software developer. Developers who integrated their software with 2captcha get reward: 10% of spendings of their software users.
proxy String No Format: login:password@123.123.123.123:3128 You can find more info about proxies here.
proxytype String No Type of your proxy: HTTP, HTTPS, SOCKS4, SOCKS5.
userAgent String No Your userAgent that will be passed to our worker and used to solve the captcha.
  • key needs to be registered on the 2Captcha official website, and there is an API key in the account settings of the dashboard. Need to recharge a certain amount
  • method is a fixed value geetest
  • gt and challenge have been seen before in the interface of the website login page. However, there is a note here, gt is only one value per website, the gt value of bilibili.com is ac597a4506fee079629df5d8b66dd4fe, but challenge is a dynamic value, each API request will get a new challenge value . Once the captcha is loaded on the page, the challenge value becomes invalid. So you need to listen to the request https://passport.bilibili.com/x/passport-login/captcha, when the website login page loads, and re-identify the new challenge value each time. The following will explain how to listen.
  • pageurl is the address of the login page https://www.bilibili.com/

So we can get a request interface like this

http://2captcha.com/in.php?key=1abc234de56fab7c89012d34e56fa7b8&method=geetest&gt=ac597a4506fee079629df5d8b66dd4fe&challenge=12345678abc90123d45678ef90123a456b&pageurl=https://www.bilibilicom/
Enter fullscreen mode Exit fullscreen mode
  1. Next, solve the problem of getting a new challenge value every time you enter the home page

The process of simulating user click login

  • Start Google Chrome first and open the home page of bilibili.com

  • Click the login button at the top, a login box will pop up

  • At this time, the verification code interface has been sent, and you can intercept the values ​​of gt and challenge by listening to the response returned by the verification code interface.

const { chromium } = require("playwright");

(async () => {
  // Select the Chrome browser, set headless: false to see the browser interface
  const browser = await chromium.launch({
    headless: false,
  });

  const page = await browser.newPage();

  // open bilibili.com
  await page.goto("https://www.bilibili.com/");

  const [response] = await Promise.all([
    // request verification code interface
    page.waitForResponse(
      (response) =>
        response.url().includes("/x/passport-login/captcha") &&
        response.status() === 200
    ),
    // Click the login button at the top
    page.click(".header-login-entry"),
  ]);

  // Get the interface response information
  const responseJson = await response.body();

  // Parse out gt and challenge
  const json = JSON.parse(responseJson);
  const gt = json.data.geetest.gt;
  const challenge = json.data.geetest.challenge;

  console.log("get gt", gt, "challenge", challenge);

  // Pause for 5 seconds to prevent the browser from closing too fast to see the effect
  sleep(5000);

  // close the browser
  await browser.close();
})();

/**
 * Simulate the sleep function, delay for a number of milliseconds
 */
function sleep(delay) {
  var start = new Date().getTime();
  while (new Date().getTime() < start + delay);
}
Enter fullscreen mode Exit fullscreen mode
  1. Use the request library to request the in.php interface separately

Install request first

npm i request
Enter fullscreen mode Exit fullscreen mode

Now it is time to request the http://2captcha.com/in.php interface

// request in.php interface
const inData = {
  key: API_KEY,
  method: METHOD,
  gt: gt,
  challenge: challenge,
  pageurl: PAGE_URL,
  json: 1,
};

request.post(
  "http://2captcha.com/in.php",
  { json: inData },
  function (error, response, body) {
    if (!error && response.statusCode == 200) {
      console.log("response", body);
    }
  }
);
Enter fullscreen mode Exit fullscreen mode

Under normal circumstances, the verification code ID will be returned at this time, such as {"status":1,"request":"2122988149"}, just take the request field.

If the interface returns the code ERROR_ZERO_BALANCE, it means that your account balance is insufficient and you need to recharge. I have recharged the minimum amount here for demonstration, and you can experience it according to your own needs.

Extended Learning

In order to improve security, we refer to the API Key in the environment variable file.

  1. Create a new environment variable file .env in the root directory and write the value of API Key
# .env file
API_KEY="d34y92u74en96yu6530t5p2i2oe3oqy9"
Enter fullscreen mode Exit fullscreen mode
  1. Then install the dotenv library to get the environment variables
npm i dotenv
Enter fullscreen mode Exit fullscreen mode
  1. Use it in js
require("dotenv").config();
Enter fullscreen mode Exit fullscreen mode

In this way, the variables in .env can be obtained through process.env.API_KEY. Usually .env files are not uploaded to the code repository to ensure the security of personal information.

  1. If you don’t want to write the information to the file while ensuring security, you can also directly enter the Node.js environment variable in the console, such as
API_KEY=d34y92u74en96yu6530t5p2i2oe3oqy9 node captcha.js
Enter fullscreen mode Exit fullscreen mode

Request res.php interface

  1. Before requesting the interface, we also sort out the required parameters
    GET parameter Type Required Description
    key String Yes your API key
    action String Yes get - get the asnwer for your captcha
    id Integer Yes ID of captcha returned by in.php.
    json IntegerDefault: 1 No Server will alsways return the response as JSON for Geetest captcha.
  • key is API_KEY, which is also used in the previous interface
  • action is fixed value get
  • id is the captcha ID just returned by in.php
  1. 20 seconds after the last request, request the http://2captcha.com/res.php interface to get the verification result
request.get(
  `http://2captcha.com/res.php?key=${API_KEY}&action=get&id=${ID}&json=1`,
  function (error, response, body) {
    if (!error && response.statusCode == 200) {
      const data = JSON.parse(body);
      if (data.status == 1) {
        console.log(data.request);
      }
    }
  }
);
Enter fullscreen mode Exit fullscreen mode

The interface will return three values ​​challenge, validate and seccode, each parameter is a string

{
  "geetest_challenge": "aeb4653fb336f5dcd63baecb0d51a1f3",
  "geetest_validate": "9f36e8f3a928a7d382dad8f6c1b10429",
  "geetest_seccode": "9f36e8f3a928a7d382dad8f6c1b10429|jordan"
}
Enter fullscreen mode Exit fullscreen mode

Among them, challenge is the parameter we intercepted earlier, validate is the verification result identifier, and the content of seccode is basically the same as that of validate, with only one more word. We need to store validate for later use.

Sometimes the verification code cannot be verified here. You can try several times, or contact 2Captcha official website to troubleshoot the problem

At this point, the information of the verification code verification result has been obtained, and the next step is to log in with the verification result.

Login

  1. Let’s first study the login process after a normal user clicks on the verification code to verify the success

login process

We found three interfaces

  • https://api.geetest.com/ajax.php: verification code interface, used to generate verification code and verify whether the verification code is passed. The validate field in the data returned by the validation interface is the geetest_validate obtained by the 2Captcha service. verification code interface
  • https://passport.bilibili.com/x/passport-login/web/key?_=1649087831803: password encryption interface, used to obtain hash and public key password encryption interface
  • https://passport.bilibili.com/x/passport-login/web/login: login interface, input parameters include account, password, token, challenge, validate and seccode, etc. login interface

We analyze these interfaces, two login schemes are available.

  1. The first solution is to request the encryption interface and the login interface in theNode.js environment to obtain the user's cookie information, and the user can log in directly with the cookie information. The difficulty of this scheme is that it needs to deal with password encryption separately, which is not very friendly to beginner.
  2. The second solution is to use Playwright to simulate the user to fill in the account and password to log in, randomly click the verification code to trigger the login, intercept the response parameter of the verification code interface, replace it with the successful verification code, and then trigger the login interface.

We take the second solution.

But I also encountered difficulties, in the Node.js environment, the verification code image could not be loaded. Then, I found the verification code interface https://api.geetest.com/ajax.php is also responsible for pulling the verification code image and verifying verification code. We directly intercept the request when pulling the verification code image, and replace the verification result to trigger the login, without waiting for the image verification code to come out. This detail is critical.

Conclusion

The above is some research on common automatic login functions in automated testing tasks. Combine the strengths of Node.js, Playwright, and 2Captcha, the verification code recognition is realized. I have uploaded the complete code to GitHub.

https://github.com/openHacking/bypass-captcha

There may be many places to be optimized, and you are welcome to point out.

Disclaimer: This script is only used as a test and learning case, and the risk is self-assessed.

Reference

Discussion (0)