DEV Community

Cover image for /llms.txt: A Simple Way to Control How AI Bots See Your Site 🤖
gleamso
gleamso

Posted on

/llms.txt: A Simple Way to Control How AI Bots See Your Site 🤖

With the rise of AI web crawlers, many sites are looking for ways to control how their content is used for AI training. While robots.txt has been the standard for traditional crawlers, there's a growing adoption of LLMs.txt for AI-specific directives.

What is LLMs.txt?

https://llmstxt.org/

LLMs.txt is a proposed standard (similar to robots.txt) that lets website owners specify:

  • Whether AI models can train on their content
  • Which parts of the site are allowed/disallowed for training
  • Attribution requirements
  • Rate limiting for AI crawlers

Quick Implementation Guide

Add an LLMs.txt file to your root directory:

# Allow training but require attribution
Allow: /blog/*
Attribution: Required
Company: YourCompany

# Disallow training on specific sections
Disallow: /private/*
Disallow: /premium/*

# Rate limiting
Crawling-Rate: 10r/m
Enter fullscreen mode Exit fullscreen mode

Real-World Examples

I looked at how major tech companies implement LLMs.txt.
You can check it here: https://llmstxt.site/

Here are some interesting patterns I found:

  • Most companies allow training on public blog content
  • Documentation is commonly restricted
  • Premium content is usually disallowed

Best Practices

  1. Start with a default policy
  2. Be explicit about attribution
  3. Consider rate limits
  4. Review periodically

Getting Started

Just create an LLMs.txt file in your site's root directory.
Here is my llms.txt: https://gleam.so/llms.txt

What are your thoughts on LLMs.txt? Are you planning to implement it on your sites?

Top comments (0)