robots.txt
Plain text file at /robots.txt that tells crawlers which paths they may or may not fetch.
Definition
robots.txt is a plain text file at the root of a domain that signals to web crawlers which paths they may visit. The format is line-based: User-agent declares which bot the following rules apply to, then Allow and Disallow directives specify paths. Honored by good actors (Google, OpenAI, Anthropic, Perplexity); ignored by hostile scrapers. Critical for controlling AI crawler access without authentication.
When to use
Edit robots.txt when you want to allow or block specific bots. For AI search visibility, allow OAI-SearchBot, PerplexityBot, and ClaudeBot; for training opt-out, block GPTBot and Google-Extended.
See also
- GPTBot — OpenAI's web crawler for future model training data — does not affect ChatGPT live retrieval.
- OAI-SearchBot — OpenAI's retrieval crawler that fetches pages for ChatGPT search — block it and you're invisible to ChatGPT.
- ClaudeBot — Anthropic's crawler for both Claude training data and Claude's live web-fetch tool.
- AI crawler — Web crawler operated by an AI company — GPTBot, OAI-SearchBot, ClaudeBot, PerplexityBot, Google-Extended.