Original: https://lwebapp.com/en/post/regular-expression-to-match-multiple-lines-of-text
Question
In our daily work, in order to improve work efficiency, we may write scripts to automate tasks. Because some websites require users to log in, the automatic login function of the script is essential.
However, when we log in to the website, we often see verification codes. The purpose of verification codes is to prevent machine logins and automate script operations. Is there a way for scripts to automatically identify verification codes to achieve login?
Next, I will use bilibili.com as an example to explain to you how to solve the most critical verification code problem in the automatic login script.
Explore
First of all, you need to experience the login method of this website and understand its verification code type.
Open https://www.bilibili.com/, open the console, click login, then a small login box will pop up in the middle, usually after entering the account and password, the verification code box will pop up, we guess the verification code interface has been requested at this time.
Since the English of the verification code is captcha
, we search for captcha
in the network
panel
An interface related to verification code was found
https://passport.bilibili.com/x/passport-login/captcha
Click on the interface to see the results, and there is some useful information, we found that the captcha type is geetest
.
{
"code": 0,
"message": "0",
"ttl": 1,
"data": {
"type": "geetest",
"token": "b416c387953540608bb5da384b4e372b",
"geetest": {
"challenge": "aeb4653fb336f5dcd63baecb0d51a1f3",
"gt": "ac597a4506fee079629df5d8b66dd4fe"
},
"tencent": {
"appid": ""
}
}
}
Through searching, I found that the verification code service used by bilibili.com is provided by geetest
, which is used by many websites. The feature of geetest
verification code is to move puzzles and select words or numbers in order.
So next, let's find a way to recognize the geetest
verification code.
I learned about the verification code solutions provided on the market, and the most effective ones are basically OCR service providers. After comparison, I found that the service of 2Captcha is very good, with fast decoding speed, stable server connection, multi-language API support, and reasonable price, I decided to try 2Captcha
.
Next, we will show the use of Nodejs
+ Playwright
+ 2Captcha
to solve the login verification code problem at bilibili.com.
If you want to use other languages and frameworks, such as
Python
+Selenium
, you can also refer to this tutorial, the idea of solving the problem is the same.
Solution
- How to identify the verification code
First read the official document 2Captcha API Geetest, the solution is very detailed, simply put
- By intercepting the website interface, get the two verification code parameters
gt
andchallenge
, requesthttp://2captcha.com/in.php
, and get the verification codeID
- Request
http://2captcha.com/res.php
after a period of time, and get thechallenge
,validate
,seccode
of successful verification
- How to apply verification results
After getting the most critical validate
, simulate the user to fill in the account and password to log in, intercept the return parameters of the verification code request interface, replace them with the parameters of successful verification, and then trigger the login interface.
Next, we analyze the detailed steps
Environment
Let's build the script execution environment first.
We use Node.js
+ Playwright
for scripting.
Make sure that Nodejs has been installed locally on your computer
Create a new empty project and install
Playwright
mkdir bypass-captcha
cd bypass-captcha
npm init
npm i -D playwright
We adopt
Playwright
's library mode, detailed documentation: Playwright
- Create a new script file
captcha.js
in the project root directory, fill in the following content, runnode captcha.js
on the command line to simply test whether the project can be started normally
const { chromium } = require("playwright");
(async () => {
const browser = await chromium.launch({
headless: false,
});
const page = await browser.newPage();
await page.goto("https://www.bilibili.com/");
await browser.close();
})();
Under normal circumstances, a Google browser interface will pop up, displaying the home page of bilibili.com, and then the browser will automatically close.
Request in.php
interface
- First, sort out the parameters required to request the
http://2captcha.com/in.php
interface. You can see the list of parameters. We will pay attention to the parameters that must be passed.
Parameter | Type | Required | Description |
---|---|---|---|
key | String | Yes | your API key |
method | String | Yes | geetest - defines that you're sending a Geetest captcha |
gt | String | Yes | Value of gt parameter you found on target website |
challenge | String | Yes | Value of challenge parameter you found on target website |
api_server | String | No | Value of api_server parameter you found on target website |
pageurl | String | Yes | Full URL of the page where you see Geetest captcha |
header_acao | IntegerDefault: 0 | No | 0 - disabled1 - enabled.If enabled in.php will include Access-Control-Allow-Origin:* header in the response. Used for cross-domain AJAX requests in web applications. Also supported by res.php. |
pingback | String | No | URL for pingback (callback) response that will be sent when captcha is solved.URL should be registered on the server. More info here. |
json | IntegerDefault: 0 | No | 0 - server will send the response as plain text1 - tells the server to send the response as JSON |
soft_id | Integer | No | ID of software developer. Developers who integrated their software with 2captcha get reward: 10% of spendings of their software users. |
proxy | String | No | Format: login:password@123.123.123.123:3128 You can find more info about proxies here. |
proxytype | String | No | Type of your proxy: HTTP, HTTPS, SOCKS4, SOCKS5. |
userAgent | String | No | Your userAgent that will be passed to our worker and used to solve the captcha. |
-
key
needs to be registered on the 2Captcha official website, and there is anAPI key
in the account settings of the dashboard. Need to recharge a certain amount -
method
is a fixed valuegeetest
-
gt
andchallenge
have been seen before in the interface of the website login page. However, there is a note here,gt
is only one value per website, thegt
value of bilibili.com isac597a4506fee079629df5d8b66dd4fe
, butchallenge
is a dynamic value, each API request will get a newchallenge
value . Once the captcha is loaded on the page, thechallenge
value becomes invalid. So you need to listen to the requesthttps://passport.bilibili.com/x/passport-login/captcha
, when the website login page loads, and re-identify the newchallenge
value each time. The following will explain how to listen. -
pageurl
is the address of the login pagehttps://www.bilibili.com/
So we can get a request interface like this
http://2captcha.com/in.php?key=1abc234de56fab7c89012d34e56fa7b8&method=geetest>=ac597a4506fee079629df5d8b66dd4fe&challenge=12345678abc90123d45678ef90123a456b&pageurl=https://www.bilibilicom/
- Next, solve the problem of getting a new
challenge
value every time you enter the home page
The process of simulating user click login
Start Google Chrome first and open the home page of bilibili.com
Click the login button at the top, a login box will pop up
At this time, the verification code interface has been sent, and you can intercept the values of
gt
andchallenge
by listening to the response returned by the verification code interface.
const { chromium } = require("playwright");
(async () => {
// Select the Chrome browser, set headless: false to see the browser interface
const browser = await chromium.launch({
headless: false,
});
const page = await browser.newPage();
// open bilibili.com
await page.goto("https://www.bilibili.com/");
const [response] = await Promise.all([
// request verification code interface
page.waitForResponse(
(response) =>
response.url().includes("/x/passport-login/captcha") &&
response.status() === 200
),
// Click the login button at the top
page.click(".header-login-entry"),
]);
// Get the interface response information
const responseJson = await response.body();
// Parse out gt and challenge
const json = JSON.parse(responseJson);
const gt = json.data.geetest.gt;
const challenge = json.data.geetest.challenge;
console.log("get gt", gt, "challenge", challenge);
// Pause for 5 seconds to prevent the browser from closing too fast to see the effect
sleep(5000);
// close the browser
await browser.close();
})();
/**
* Simulate the sleep function, delay for a number of milliseconds
*/
function sleep(delay) {
var start = new Date().getTime();
while (new Date().getTime() < start + delay);
}
- Use the
request
library to request thein.php
interface separately
Install request
first
npm i request
Now it is time to request the http://2captcha.com/in.php
interface
// request in.php interface
const inData = {
key: API_KEY,
method: METHOD,
gt: gt,
challenge: challenge,
pageurl: PAGE_URL,
json: 1,
};
request.post(
"http://2captcha.com/in.php",
{ json: inData },
function (error, response, body) {
if (!error && response.statusCode == 200) {
console.log("response", body);
}
}
);
Under normal circumstances, the verification code ID
will be returned at this time, such as {"status":1,"request":"2122988149"}
, just take the request
field.
If the interface returns the code
ERROR_ZERO_BALANCE
, it means that your account balance is insufficient and you need to recharge. I have recharged the minimum amount here for demonstration, and you can experience it according to your own needs.
Extended Learning
In order to improve security, we refer to the API Key
in the environment variable file.
- Create a new environment variable file
.env
in the root directory and write the value ofAPI Key
# .env file
API_KEY="d34y92u74en96yu6530t5p2i2oe3oqy9"
- Then install the
dotenv
library to get the environment variables
npm i dotenv
- Use it in js
require("dotenv").config();
In this way, the variables in .env
can be obtained through process.env.API_KEY
. Usually .env
files are not uploaded to the code repository to ensure the security of personal information.
- If you don’t want to write the information to the file while ensuring security, you can also directly enter the Node.js environment variable in the console, such as
API_KEY=d34y92u74en96yu6530t5p2i2oe3oqy9 node captcha.js
Request res.php
interface
- Before requesting the interface, we also sort out the required parameters
GET parameter Type Required Description key String Yes your API key action String Yes get - get the asnwer for your captcha id Integer Yes ID of captcha returned by in.php. json IntegerDefault: 1 No Server will alsways return the response as JSON for Geetest captcha.
-
key
isAPI_KEY
, which is also used in the previous interface -
action
is fixed valueget
-
id
is the captchaID
just returned byin.php
- 20 seconds after the last request, request the
http://2captcha.com/res.php
interface to get the verification result
request.get(
`http://2captcha.com/res.php?key=${API_KEY}&action=get&id=${ID}&json=1`,
function (error, response, body) {
if (!error && response.statusCode == 200) {
const data = JSON.parse(body);
if (data.status == 1) {
console.log(data.request);
}
}
}
);
The interface will return three values challenge
, validate
and seccode
, each parameter is a string
{
"geetest_challenge": "aeb4653fb336f5dcd63baecb0d51a1f3",
"geetest_validate": "9f36e8f3a928a7d382dad8f6c1b10429",
"geetest_seccode": "9f36e8f3a928a7d382dad8f6c1b10429|jordan"
}
Among them, challenge
is the parameter we intercepted earlier, validate
is the verification result identifier, and the content of seccode
is basically the same as that of validate
, with only one more word. We need to store validate
for later use.
Sometimes the verification code cannot be verified here. You can try several times, or contact 2Captcha official website to troubleshoot the problem
At this point, the information of the verification code verification result has been obtained, and the next step is to log in with the verification result.
Login
- Let’s first study the login process after a normal user clicks on the verification code to verify the success
We found three interfaces
-
https://api.geetest.com/ajax.php
: verification code interface, used to generate verification code and verify whether the verification code is passed. Thevalidate
field in the data returned by the validation interface is thegeetest_validate
obtained by the 2Captcha service. -
https://passport.bilibili.com/x/passport-login/web/key?_=1649087831803
: password encryption interface, used to obtain hash and public key -
https://passport.bilibili.com/x/passport-login/web/login
: login interface, input parameters include account, password,token
,challenge
,validate
andseccode
, etc.
We analyze these interfaces, two login schemes are available.
- The first solution is to request the encryption interface and the login interface in the
Node.js
environment to obtain the user's cookie information, and the user can log in directly with the cookie information. The difficulty of this scheme is that it needs to deal with password encryption separately, which is not very friendly to beginner. - The second solution is to use
Playwright
to simulate the user to fill in the account and password to log in, randomly click the verification code to trigger the login, intercept the response parameter of the verification code interface, replace it with the successful verification code, and then trigger the login interface.
We take the second solution.
But I also encountered difficulties, in the Node.js
environment, the verification code image could not be loaded. Then, I found the verification code interface https://api.geetest.com/ajax.php
is also responsible for pulling the verification code image and verifying verification code. We directly intercept the request when pulling the verification code image, and replace the verification result to trigger the login, without waiting for the image verification code to come out. This detail is critical.
Conclusion
The above is some research on common automatic login functions in automated testing tasks. Combine the strengths of Node.js
, Playwright
, and 2Captcha
, the verification code recognition is realized. I have uploaded the complete code to GitHub.
There may be many places to be optimized, and you are welcome to point out.
Disclaimer: This script is only used as a test and learning case, and the risk is self-assessed.
Top comments (0)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.