DEV Community

loading...
Cover image for Bad Bot 101: What is it & How to Detect and Block Bad Bots?

Bad Bot 101: What is it & How to Detect and Block Bad Bots?

MarianhBurnsk
I am a professional leader, good at network research, market research, data mining, and I am willing to share my experience.
・15 min read
As a website owner or administrator, it is important you know about bad bots and some of the bad effects they can have on your site, and how to block them. All of these and many more will be discussed below.

An Ultimate Guide to Bad BotsAbout half of the Internet traffic is bot traffic, and a half of that originates from bad bots. What does this mean to you as a web service owner or even an admin? If you are not careful, you will be making decisions based on contaminated data – thanks to bot traffic.

I have had experience with newbie site owners claiming they have over 1K pageview only for me to dig deeper and break the sad news to them – a sizable amount of that traffic is not from a human – there are computers at the other end interacting with your website in an automated manner, and you have mistaken some of their actions to be human actions.

Unfortunately, some of the actions they carry out have adverse effects – or have access to the content they shouldn’t. for these reasons, you need to be proactive and learn how to detect when a bad bot is messing around with your web property.

Failure to do so will means accepting anything thrown at you – even by your competitor – and you know what that means? In this article, I will be opening your eyes to the world of bad bots, how they affect your site, and how to block them, among other lessons to be learned.


What is an Internet Bot?

This article is all about Internet bots with a focus on the bad ones. For this reason, let take a look at what Internet bots are. Internet bots, also known as web bots or simply bots, are computer programs that carry out automated tasks on the Internet.

Basically, they send web requests as you would with a web browser but have the power to do that in a repetitive and at fast speed. While a human can only send a handful of requests in a minute, a web bot can send hundreds and even thousands of requests in a minute. One thing you need to know about web bots is that they carry out tasks that are repetitive and simple – but some can carry out complex tasks. Internet Bot With a bot, you can automate the checkout process of purchasing limited-edition items, monitor your site performance, carry out SEO audits, and scrape data from web pages. Likewise, you can use computer bots to perform a denial-of-service attack. It can also be used for click fraud, view fraud, and even spam.

From the two groups, you can see that there are bad bots as well as good ones. While you want to keep the good bots around, you will want to keep the bad bots out of your site as much as you can.


What Makes a Bot a Bad Bot?

From the above, you already have the idea that not all bots are bad. So, what makes a bad a bad bot? Generally, what makes a bot a bad bot is subjective.

Take, for instance, Bing search engine web crawler might be good to you as a site owner as it helps index your site – and could potentially send you traffic.

However, if the same Bing web crawler crawls Google SERPs and scrapes their listing and ranking to use it to improve their own ranking, then it becomes a bad bot to Google while being good to you. However, this does not mean that there is no general acceptance of what makes a bot a bad bot. Bot with malicious acts A bot becomes a bad bot when it is used for carrying out malicious acts. But no, that’s not all, there are bots that might not really have malicious intent such as web scraping bot, but for the fact that they might have adverse effects on the sites they extract data from without any benefit to the site or it owner, then they can also be regarded to bad bots.

Bad bots can also be regarded as malicious bad. Their activities are to reward their operators at the expense of the sites they operate on. Unfortunately, the increase in bad bot traffic is on the increase, and you need to watch out for them.


Good Bots Vs. Bad Bots

From the above, you have an idea of what bad bots are. What about their good counterparts? The intent behind a bot is what generally makes it a good bot, provided it sticks to that and does not adversely affect the sites it operates on. There are two things you need to take note of.

The first is that the intent must be good, and the second – it must not have any negative effect on sites it interacts with. Also, good bots respect the robots.txt directives of a site while bad bots do not. With this, you need to understand that a good bot can quickly turn into a bad bot.

Take, for instance, a crawler that has been designed to crawl your site for the purpose of indexing could adversely affect your site if it sends too many requests that your site can handle. Good Bots Vs. Bad Bots So, unlike bad bots, good bots are out to help you. So, what are the examples of good bots out there? Perhaps the most notable examples of good bots are search engine crawlers such as Googlebot.

These bots crawl your site in other to index your site content so that when a user searches for relevant keywords, they can send such users to your site. These sites respect the robots.txt file and wouldn’t crawl your site if you do not want them to crawl your site.

Aside from search engine crawlers, there are some bots for helping out with copyright searches, SEO audit bots that are good bots. Some of these bots might not be beneficial to you but not harmful and provide benefits for the betterment of the Internet, such as crawlers of Internet archives. Good bots are known as also known as helpful bots.


Types of Bad Bots

Just so if you do not know, there are many types of bad bots in the market, and each has the malicious act it is meant to achieve. In this section of the article, we will be discussing the popular types of bad bots interacting with websites and web services on the Internet.

  • Traffic Bot

Traffic Bot Traffic bots are the types of Internet bots that have been developed to send fake traffic to a website. These are the bots that will inflate your pageview count, giving you the hope of an increase in traffic when, in reality, there was actually no increase in traffic.

These types of bots are used by web services that sell traffic. They will make you a promise of sending real users to your site but will end up sending bot traffic. There are different variations to traffic bot.

While the ones described the deal with inflating page view counts, we have ones that are used for clicking advertisement in the case of ads fraud, some watch videos and click links, among others.

  • Spam Bots

spam botSpam bots are the types of Internet bots that visit web pages and carry out tasks that can be regarded as spam. One of the notable examples of spam bots are bots that post automated comments in the comment section of a blog or in a discussion forum.

If you manage a blog or a website that allows user-generated content, then there is a high chance that you must have come across this type of bot as the comment are generic, spammy, and usually contain URLs the operators have an interest in. Some spam bots are used for political campaigns and changing narratives.

  • Web Scrapers

Web scrapers with Internet bots Web scrapers are the kind of Internet bots you will call content theft facilitating bots. This is because they are designed to visit web pages and extract data from them even without the permission of the web owner or administrator. They are the tool for web data extraction.

While the act of web scraping in the face of the law is legal as far as the content is publicly available, does not require authentication to access, and the content being scrapped is not copyrighted, website owners frown at it and, as such, can be regarded as a bad bot. In some situations, web scrapers can even shut down a low-powered website if the send too many requests.

  • Botnets

BotnetsA botnet can be referred to as a collection of zombie computers. Zombie computers are compromised computers that hackers have access to without the knowledge of their owners. Hackers can make use of botnets for coordinated DDoS attacks in order to bring down a computer. They could also make use of them for other malicious tasks.

  • Checkout Bots

checkout bot Another class of bots that are regarded as bad bots are checkout bots used during limited-edition releases. It is a known idea that limited-edition releases are competitive, and a user is entitled to only one unit of an item, It's hot on sneaker copping.

However, some bots have been developed to go through the process of the purchase at a fast speed to purchase more units, thereby depriving others the opportunity of making a purchase only to turn out and sell to them at resale prices. This is common in the apparel, ticketing, and sneaker market.

Aside from the above types of bad bots, there are many others, but the ones described above are the most common on the Internet.


Effects of Bad Bots

Effects of Bad Bots From the above, if you have read in between the lines, you know some of the adverse effects of bad bots. But if you do not. Then do not worry; we will be discussing them in detail below.

  • Bots Contaminate your Engagement Data

One thing you need to understand is that while some bot traffic can be detected and separate from real human traffic, some are stealthy, and there is no way you will differentiate them from human traffic. And there lies in the problem. Traffic bots can increase pageview and even give you the impression that you have unique visitors.

If this happens, then just know that your engagement data has been contaminated, and any decision you make off it is wrong except if you are able to estimate the percentage of bot traffic and then subtract it from your overall traffic.

  • Slows Down Performance

Slows Down Performance with bad bot When a web server gets too many requests more than it can handle, its performance will be tempered. Unfortunately, bots are known to send too many requests and, as such, could overwhelm a web server if the web server is low-powered. This is the reason some bot operators run their bots only at night.

Aside from bots that were not intentionally developed to slow down a website, there are some that have been designed to attack a website by sending it too many requests more than it can handle, and because of this, it is shut down. This type of cyber-attack is known as a DDoS attack and is carried out using botnets.

  • Steal Data

Steal Data with bad bot Another side effect of bad bots from a website administrator’s perspective is that data on their websites are extracted without their consent and permission, which could mean stealing. In some instances, these data cost the website money to generate and only available behind a paywall. Using a web scraper, the data can be collected and make public.

  • Increase Server Running Cost

If you can, it is better you discourage bot traffic on your website. This is because even if you do not see any side effect it has directly on your site, then you need to know that they increase your running cost as web hoist careless whether requests are coming from human or a bot, and as such, they could scale up, and too many requests are received, and then your cost is geared up. Unfortunately, bot traffic, especially the bad ones, is not of any benefit to you.


How to Detect Bad Bots

Block Bad Bots Bad bots have been designed to be undetectable, and as such, it will take careful digging to know if your site has a bot traffic attack. Let take a look at some of the pointers of bot traffic.

  • Unusual Spike in Traffic

As a site owner, you have your average pageview and unique daily visit you should use as a metric. If there is an unusual spike in traffic and you cannot pinpoint the reason behind the spike in traffic, then most likely, bots are interacting with your site.

Not only will you get a spike in traffic, but the traffic will most be direct traffic with a highly unusual bounce rate. The location the traffic originates from can also give you a clue, especially if it is from locations you do not get such an amount of traffic from. [su_note note_color="#f9f8f8" radius="0"] Another thing you need to take note of is the inconsistencies in pageview data between Google Analytics and your other traffic analytics service, like Crazyegg and Microsoft Clarity that offer Recordings and Replay user sessions and Heatmaps to Analysis, is Bot or not! [/su_note] Many analytic services have support for filtering bot traffic out – but the algorithm the use in detecting bot traffic is different, and as such, you can leverage on the inconsistency that does not exist before to tell when your site is being accessed by bots.

  • Server Performance is Affected Unexpectedly

This point is related to the number of requests been sent in a short while. If too many requests are sent in a short period of time, this will affect the performance of your site if the infrastructures are low-powered. As a web administrator, you should also have a record of the performance of your site in terms of response time and speed.

When your site becomes slow and there is a spike in traffic, then most likely, your site is being tampered with by bad bots. You cannot always rely on this as bots can mimic humans and slow down the rate at which they send requests, thereby keeping performance the same while carrying out their tasks unnoticed.

  • Junk Activities and Content Posting

One of the easiest ways to detect bots is in the content they post. Usually, bot operators do not have the time to craft good content, and as such, you can use that as a pointer.

When you start getting comments or posts that are generic, nonsensical, and with URLs embedded, then you need to know that those aren’t from real users from bots. Also, an unusual number of account creation with strange emails and other personal information is a pointer too.

  • Request Header Inspection

Most basic and unsophisticated bots do not send all of the headers that browsers send. In most cases, they even forget to set the user-agent header.

For those that send headers, they hardly send more than the user-agent string. In contrast, most browsers send a good number of header information, which is used for content negotiation. If you get requests with little to no headers submitted, then you can be sure that the request originates from a bot.


How to Block Bad Bots

Bad Bots detection When it comes to blocking bots from accessing your site, you need to know that you cannot succeed 100 percent. Facebook has not been able to do so, and so is Google, Amazon, and other major web services. All you can do is try and then make it unattractive because of the extra length they will have to go.

  • Set IP Address Request Limits

IP addresses are some of the unique identifiers web admin has access to and can use to pin down users. The could use this to their advantage by having the number of requests allowed from an IP address in a period of time.

If you get more requests from an IP address, this becomes unnatural, and as such, you can block subsequent requests from such an IP address. It is also important you block hosting providers used by datacenter proxy providers and their IP networks.

  • Make Use of Captcha Service

I bet you have had to deal with Captcha at some point on the Internet.  Captcha is the acronym for Completely Automated Public Turing Test to tell Computers from Human. When a captcha service detects unusual or bot-like activity, it forces you to solve a problem in order to gain access.

While humans find it easy to solve them, computer programs such as bots find it difficult. In fact, there are advanced Captcha systems that are practically not solvable by bot. However, you need to know that Captcha can stand in the way of legitimate users and disrupt the user experience.

  • Block Unknown Browsers

Another way to block bots is to record fingerprints of major browsers and then block requests from sources that do not match any of the browsers you have fingerprinted. This will work for inexperienced bot developers as they do not make use of headless browsers and do not render JavaScript. You can even bring JavaScript triggers into the mix and block any request that does not trigger the JavaScript to execute.


Conclusion

Looking at the above, you can tell that bad bots are difficult to deal with. Most of the techniques you can use to block them can be circumvented. Take, for instance, if you are using IP tracking, proxies will make them useless.

There are anti-Captcha services out there to help solve Captcha while using headless browsers will take care of all JavaScript related anti-bot techniques.

Instead of trying to detect and block them yourself, I will advise you to make use of third-party services that can detect bad bots and block them – this will be a better option than using crude methods that can easily be bypassed.

Discussion (0)

Forem Open with the Forem app