We can try to block the scraping bots which exhaust server resources by setting up a honeypot trap in nginx and block those unwanted IP's. This article is for opensource nginx on alpine linux base image. Nginx plus already has a bot trapping feature.
Steps:
1. Creating the trap
We will add some invisible links to the web page. Human users won't click on on it. Only the bots which scrape the page will access these specific hidden hrefs. <a href="/one-trap-here"></a>
Trap ready.
2. Add rule in nginx to handle the traps
location /one-trap-here {
include honeytrap.conf;
}
Once the trap is visited, we will include the honey trap conf.
3. Writing honeytrap.conf
This is another nginx configuration file which will handle the script execution to block the ip that accessed the page.
Nginx cannot execute scripts on it's own and Common Gateway Interface (CGI
) is used for this purpose. FastCGI
is the package that efficiently manages script execution for large number of incoming requests.
Since we need to run a simple script that blocks the ip, we can use fcgiwrap
. fcgiwrap is the lightweight FastCGI wrapper.
So, we can install fcgiwrap
and honeytrap.conf
will look something like
fastcgi_intercept_errors off;
fastcgi_pass unix:/run/fcgiwrap-unix.sock;
include fastcgi_params;
root /usr/local/libexec;
fastcgi_param SCRIPT_FILENAME $document_root/block-ip.cgi;
The fcgiwrap process communicates to nginx using socket file which we have to create in /run/
and the owner, the same user as the nginx.
once fcgiwrap is installed, the following will link it as a socket file.
/usr/bin/fcgiwrap -s unix:/run/fcgiwrap-unix.sock &
4. Add CGI script
Content of /usr/local/libexec/block-ip.cgi could be to execute a shell script that will do the actual blocking and to return the http status code applicable in this scenario.
#!/bin/sh
echo "Status: 410 Gone"
echo "Content-type: text/plain"
echo
echo "Get lost, $REMOTE_ADDR!"
/usr/local/bin/block-ip.sh
Don't forget to make the script executable.
5. Add shell script
Basic firewall in linux is handled by netfilter and we can add rules to the iptables in linux to drop any incoming request from a specific ip.
In iptables, there are chains for INPUT, FORWARD and OUTPUT packets and you can add what action to take when a specific address makes requests like below
/sbin/iptables -A trap1 -s ${REMOTE_ADDR} -j DROP
This means drop all requests from this remote address, append this to the trap1 chain. Further, this chain needs to be added to the ACCEPT, FORWARD, OUTPUT chains.
Rather, a more cleaner approach would be to use ipset. With ipset, we can create specific sets for ipv4 addresses, ipv6 addresses, host name etc - basically you can categorize the items and then add them to the iptables.
ipset -A trap1ipset ${REMOTE_ADDR}
To disconnect any keepalive connection to nginx, we could use conntrack-tools
also.
First step would be to create ipsets for ipv4 and ipv6 addresses and add them to iptables. Note that separate package ipv6tables is required for ipv6 address handling. This has to be done on when the docker image boots up, possibly in the start up scripts.
sample
ipset -N ipv4trap iphash family inet
ipset -N ipv6trap iphash family inet6
iptables -A INPUT -m set --match-set ipv4trap src -j DROP
ip6tables -A INPUT -m set --match-set ipv6trap src -j DROP
And,the anatomy of the shell script is as follows:
#!/bin/bash
IPT=/sbin/iptables
if [[ -z ${REMOTE_ADDR} ]]; then
if [[ -z "$1" ]]; then
echo "REMOTE_ADDR not set!"
exit 1
else
REMOTE_ADDR=$1
fi
fi
if [[ "$REMOTE_ADDR" != "${1#*[0-9].[0-9]}" ]]; then
ipset -A ipv4trap ${REMOTE_ADDR}
/usr/sbin/conntrack -D -s ${REMOTE_ADDR}
elif [[ "$REMOTE_ADDR" != "${1#*:[0-9a-fA-F]}" ]]; then
ipset -A ipv6trap ${REMOTE_ADDR}
/usr/sbin/conntrack -D -s ${REMOTE_ADDR}
else
echo "Unrecognized IP format '$1'"
fi
Important: To access the network utils docker needs added privileges on start up.
Either, --cap-add NET_ADMIN
flag has to be passed on docker run command
or
cap-add
- NET_ADMIN
has to be added in the docker compose file.
Now, this can be tested by accessing one of the trap urls. Look for rules getting added in the iptables under ACCEPT chain. The next try to access will not be accepted by the container.
iptables --list
or
ipset list
can be used to view details.
This solution could be improved for a persistent one where IP's will be saved and loaded to the iptables when the container boots up.
For that, ipset save
and ipset restore
could come handy - which will write the IP's in memory to a file and restore it.
ipset save bad-ips -f ipset-bad-ips.backup
and
ipset restore -! < ipset-bad-ips.backup
Adaptations for docker:
Copying and making the scripts executable can be done on the Dockerfile as needed.
Creating the socket file can be done using startup script or startup CMD in Dockerfile.
Top comments (0)