1. Background
Automated bots and malicious web crawlers can consume a significant amount of network bandwidth by repeatedly accessing your site over extended periods. When you check your cloud server's management dashboard, you might notice that most of the traffic is concentrated on a few IP addresses. A straightforward solution to this problem is to limit the frequency of requests from these IP addresses.
However, rate-limiting IP addresses is typically not related to the business logic of your application, and developers are often reluctant to maintain an IP request frequency table themselves. Additionally, manually managing visitor information in a distributed or concurrent environment can be quite costly in terms of development effort.
This is where SafeLine WAF by Chaitin comes in. SafeLine offers a suite of features including rate limiting, port forwarding, and manual IP blacklisting/whitelisting, alongside its core functionality of defending against web attacks.
2. Installing SafeLine
bash -c "$(curl -fsSLk https://waf.chaitin.com/release/latest/setup.sh)"
For detailed instructions, refer to: https://docs.waf.chaitin.com/en/tutorials/install
3. Logging into SafeLine
Open the web console page https://<safeline-ip>:9443/
in the browser, then you will see below.
Get Administrator Account
docker exec safeline-mgt resetadmin
After the command is successfully executed, you will see the following content
[SafeLine] Initial username:admin
[SafeLine] Initial password:**********
[SafeLine] Done
Enter the password in the previous step and you will successfully logged into SafeLine.
4. Configuring Your Site and Rate Limiting
4.1 SafeLine Site Configuration
SafeLine provides comprehensive site configuration options, including automatic TLS certificate and private key uploads, and the ability to specify multiple forwarding ports. This eliminates the need for developers to manually configure Nginx.
4.2 Configuring Rate Limiting
You can customize the blocking strategy according to your needs. A common recommendation is to set a limit of 100 requests per 10 seconds, with a block duration of 10 minutes.
Note: If you're testing or encounter false positives, you can manually lift the block.
5. Testing and Additional Considerations
5.1 Testing
For testing, we set up a simple server that offers an endpoint with a "hello" response and an "a" parameter. Here’s a basic Python script for testing with a web crawler:
def send_request(url, request_method="GET", header=None, data=None):
try:
if header is None:
header = {"User-Agent": "Mozilla/5.0"}
response = requests.request(request_method, url, headers=header)
return response
except Exception as err:
print(err)
pass
return None
if __name__ == '__main__':
for i in range(100):
str = random.choice('abcdefghijklmnopqrstuvwxyz')
resp = send_request("http://a.com/hello?a=" + str)
print(resp.content)
Output Example:
b'{"a":"u"}'
b'{"a":"m"}'
b'{"a":"y"}'
b'{"a":"o"}'
b'<!DOCTYPE html>\n\n<html lang="zh">\n <head>\n .... # followed by a long HTML text
At this point, if you try to access the page again, you'll find that it has been blocked.
5.2 What if the Crawler Fakes the X-Forwarded-For Header?
Some crawlers are sneaky and might fake the X-Forwarded-For
header. To counter this, SafeLine allows you to configure the source IP retrieval method. Simply go to 'Proxy Setting' -> 'Get Attack IP From' -> and select 'Socket Connection'. This ensures that the IP is obtained directly from the TCP connection.
What if the crawler fakes the TCP Source IP field too?
If a crawler fakes the TCP header information, the HTTP handshake, which is based on TCP, will fail. As a result, the crawler will lose its ability to scrape data, and the request will be dropped by Nginx.
Top comments (0)