DEV Community

Horacio Degiorgi
Horacio Degiorgi

Posted on

2

Blocking bots in Nginx

At bibliotecas.uncuyo.edu.ar we have multiple services running behind a reverse proxy based on nginx.
For days now all systems have been slowing down. Analyzing the usage logs we have found a massive increase in "visits" from AI bots.
How do we block them?
Using rules in the definition of the proxy_hosts

if ($http_user_agent ~* "amazonbot|Claudebot|claudebot|DataForSeoBot|dataforseobot|Amazonbot|SemrushBot|Semrush|AhrefsBot|MJ12bot|YandexBot|YandexImages|MegaIndex.ru|BLEXbot|BLEXBot|ZoominfoBot|YaK|VelenPublicWebCrawler|SentiBot|Vagabondo|SEOkicks|SEOkicks-Robot|mtbot/1.1.0i|SeznamBot|DotBot|Cliqzbot|coccocbot|python|Scrap|SiteCheck-sitecrawl|MauiBot|Java|GumGum|Clickagy|AspiegelBot|Yandex|TkBot|CCBot|Qwantify|MBCrawler|serpstatbot|AwarioSmartBot|Semantici|ScholarBot|proximic|MojeekBot|GrapeshotCrawler|IAScrawler|linkdexbot|contxbot|PlurkBot|PaperLiBot|BomboraBot|Leikibot|weborama-fetcher|NTENTbot|Screaming Frog SEO Spider|admantx-usaspb|Eyeotabot|VoluumDSP-content-bot|SirdataBot|adbeat_bot|TTD-Content|admantx|Nimbostratus-Bot|Mail.RU_Bot|Quantcastboti|Onespot-ScraperBot|Taboolabot|Baidu|Jobboerse|VoilaBot|Sogou|Jyxobot|Exabot|ZGrab|Proximi|Sosospider|Accoona|aiHitBot|Genieo|BecomeBot|ConveraCrawler|NerdyBot|OutclicksBot|findlinks|JikeSpider|Gigabot|CatchBot|Huaweisymantecspider|Offline Explorer|SiteSnagger|TeleportPro|WebCopier|WebReaper|WebStripper|WebZIP|Xaldon_WebSpider|BackDoorBot|AITCSRoboti|Arachnophilia|BackRub|BlowFishi|perl|CherryPicker|CyberSpyder|EmailCollector|Foobot|GetURL|httplib|HTTrack|LinkScan|Openbot|Snooper|SuperBot|URLSpiderPro|MAZBot|EchoboxBot|SerendeputyBot|LivelapBot|linkfluence.com|TweetmemeBot|LinkisBot|CrowdTanglebot") { return 403; }
Enter fullscreen mode Exit fullscreen mode

In our case, since we use proxymanager to manage the different domains, the entry of this configuration is done in the advanced section

advanced conf in proxymanager

Hostinger image

Get n8n VPS hosting 3x cheaper than a cloud solution

Get fast, easy, secure n8n VPS hosting from $4.99/mo at Hostinger. Automate any workflow using a pre-installed n8n application and no-code customization.

Start now

Top comments (1)

Collapse
 
messenger1012 profile image
MUHAMMED YAZEEN AN • Edited

Great article! Blocking bots using User-Agent strings is a good starting point, and you've explained it really well.

I just wanted to add that User-Agent blocking can sometimes be bypassed since the User-Agent header can be easily spoofed. To make bot blocking more robust, we could combine it with other techniques like:

  • Rate limiting: Restrict the number of requests a client can make in a short time.
  • IP blocking: Block known malicious IPs or ranges.
  • Behavior-based detection: Identify bots by analyzing unusual patterns like high request rates, skipping resources, or accessing non-existent pages.
  • JavaScript challenges: Verify if the client can execute JavaScript, as most bots cannot.
  • CAPTCHAs: Add a CAPTCHA to sensitive areas like login pages or forms. -** Advanced abilities**: Services like Cloudflare or AWS WAF can provide more comprehensive bot protection. Combining these techniques can help create a stronger defense against bots. Thanks again for sharing this—it’s a great resource for anyone looking to get started!

AWS Q Developer image

Your AI Code Assistant

Automate your code reviews. Catch bugs before your coworkers. Fix security issues in your code. Built to handle large projects, Amazon Q Developer works alongside you from idea to production code.

Get started free in your IDE

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay