Bruno Corrêa

Posted on Sep 16, 2024

How to Use Shadow DOM and Honeypots to Deter Crawlers

#webdev #security #nextjs #tutorial

This is a short introduction and demonstration on how we can increase security on forms, especially against crawling, it's noticeable that it's most likely overkill for most applications but very interesting for those who want to understand a bit more about such practices as and how it's possible to undermine them.

Disclaimer: This post was inspired by Felippe Regazio, on a late-night coding stream, make sure to give him a follow. Huge thanks. Also, I'm not a senior dev, so please feel free to double-check any information you believe I could've misstated so that we can correct/improve it together.

Next.js + TypeScript implementation app:

For any purposes, you can refer to this repo to check out the application I made to learn and demonstrate the following concepts, You can also just as easily make this with HTML, CSS, and JavaScript, since there were less examples on React, I've decided to build one approach of it here.

Also, feel free to try and hack it too, I've made a deployment available here.

Understanding Web Crawlers

Web crawlers, also known as bots or spiders, are automated programs designed to browse the internet and gather data. They're essential for search engines like Google, but they can also be used for more malicious purposes, like scraping sensitive information or submitting spam through forms. Crawlers often utilize JavaScript methods like document.querySelectorAll() or document.forms to identify and interact with elements, such as form fields, by their name attributes.

Most crawlers look for predictable patterns, such as:

input[name="name"]
input[name="email"]
input[type="submit"]

The Risks of Crawling

While crawling itself isn't inherently bad, when malicious bots target your site, it can result in spam form submissions, scraping sensitive data, or even overloading your server with requests. That's where Shadow DOM and honeypots come into play as security measures.

What is Shadow DOM?

The Shadow DOM is a part of the Web Components standard that allows for the encapsulation of DOM trees. Essentially, elements in a Shadow DOM tree are isolated from the rest of the document, preventing external scripts (like those from crawlers) from easily accessing them.

Another common use for Shadow DOM is when you have any service app that a website can embed into their platform and you want to make sure that any important information won't be able to be changed, such as what an input title represents, or even a copyright statement such as "Powered by my-app-name", which is pretty common on e-commerce apps for Shopify, WordPress, amongst others.

Pros of Using Shadow DOM for Forms:

Encapsulation: Elements within the Shadow DOM are not accessible by global selectors such as document.querySelectorAll. Crawlers won't be able to easily find or interact with these elements.
Style isolation: CSS and JavaScript scoped to a Shadow DOM will not leak into the main document, and vice versa. This makes the UI more secure from unintended changes or external interference. And of course, that can also be a downside from other perspectives.

Downsides:

Limited accessibility: Shadow DOM can make elements harder to access for legitimate purposes, such as custom analytics or third-party tools.
Potential developer complexity: You need to carefully manage interactions between Shadow DOM and the rest of your site to avoid conflicts, especially when debugging.

Using Shadow DOM to Hide Forms

By placing a form inside the Shadow DOM, bots attempting to use document.forms or document.querySelectorAll will not be able to access it. Here’s an example of encapsulating a form using Shadow DOM:

const ShadowForm: React.FC = () => {
  const formRef = useRef<HTMLDivElement>(null);
  const shadowRootRef = useRef<ShadowRoot | null>(null);

  useEffect(() => {
    if (formRef.current && !shadowRootRef.current) {
      shadowRootRef.current = formRef.current.attachShadow({ mode: "closed" });

      const form = document.createElement("form");
      form.noValidate = true;
      form.innerHTML = `
        <input type="text" name="name" aria-hidden>
        <!-- Regular form fields here -->
      `;

      shadowRootRef.current.appendChild(form);
    }
  }, []);

  return <div ref={formRef}></div>;
};

In this example, the form fields, including an anti-crawler honeypot field (name="name"), are encapsulated in the Shadow DOM, preventing external scripts from accessing or interacting with the form.

The Honeypot Technique

A honeypot is a hidden field added to a form, which humans can’t see but bots might fill out. If the field is filled, it indicates a likely bot interaction, and you can prevent the form submission.

In this case, the field is hidden both visually (via aria-hidden) and through its encapsulation in the Shadow DOM:

const FIELD = {
  HONEYPOT: "name",
  NAME: "%",
  EMAIL: "@",
  PASSWORD: "$",
  CONFIRM_PASSWORD: "C",
};

export function handleFormSubmission(formData: FormData, showError: (id: string, message: string) => void) {
  if (formData.get(FIELD.HONEYPOT)) {
    // Prevent submission
    return;
  }

  // Continue with validation...
}

Here, FIELD.HONEYPOT represents a field that bots may fill in. If the field is filled, the form submission is blocked, preventing spam submissions.

Using Special Characters in Input Names

Many crawlers rely on identifying common field names such as name, email, and password. By using unconventional names like @, %, or $, you make it harder for crawlers to identify key fields.

const inputConfigs = [
  { name: "%", type: "text", placeholder: "Name" },
  { name: "@", type: "email", placeholder: "Email" },
  { name: "$", type: "password", placeholder: "Password" },
  { name: "C", type: "password", placeholder: "Confirm Password" },
];

Using non-standard input names reduces the likelihood of bots correctly filling in form fields, while still being fully functional for human users.

Why This Might Be Overkill for Most Sites

While combining Shadow DOM with honeypots and unconventional input names can improve security against automated form submissions, this level of defense is generally more suitable for high-risk environments. Many websites don’t need to go to these lengths because simpler anti-bot solutions, like CAPTCHAs or rate-limiting, are often effective.

Potential Side Effects

SEO Implications: Because elements inside Shadow DOM are hidden from the rest of the document, some search engines might not be able to index your content properly.
User Experience: Over-complicating the form submission process might frustrate legitimate users or make the form inaccessible to certain browsers or assistive technologies.

Conclusion

Using the Shadow DOM, honeypots, and unconventional input names offers a robust, albeit niche, security approach to fend off form-submitting bots. These methods hide form fields from common crawling techniques and add extra layers of protection to ensure your forms are only submitted by humans. However, it’s important to weigh the benefits against the potential downsides and determine if this approach is truly necessary for your site.

Let's get our hands dirty

Next.js + TypeScript example repository: https://github.com/brinobruno/shadow-dom-react-next
Next.js + TypeScript example deployment: https://shadow-dom-react-next.vercel.app/

References

here are some authoritative sources and references you could explore for deeper insights:

Let's connect

I'll share my relevant socials in case you want to connect:
Github
LinkedIn
Portfolio

DEV Community

How to Use Shadow DOM and Honeypots to Deter Crawlers

Next.js + TypeScript implementation app:

Understanding Web Crawlers

The Risks of Crawling

What is Shadow DOM?

Pros of Using Shadow DOM for Forms:

Downsides:

Using Shadow DOM to Hide Forms

The Honeypot Technique

Using Special Characters in Input Names

Why This Might Be Overkill for Most Sites

Potential Side Effects

Conclusion

Let's get our hands dirty

References

Shadow DOM Documentation:

FormData and XMLHttpRequest:

Honeypot Technique:

Cross-Site Request Forgery (CSRF):

Let's connect

Top comments (0)

Read next

🎉 The Fun Beginner’s Guide to Bluetooth on Void Linux 🎉

Explaining Scoped Context in React with example

Introduction to Gleam Programming Language

3. How to setup Jest in a Next 15 project (+ eslint for testing)