AI crawler
Web crawler operated by an AI company — GPTBot, OAI-SearchBot, ClaudeBot, PerplexityBot, Google-Extended.
Definition
An AI crawler is a web crawler operated by an AI company to collect content for either model training or live retrieval. The major ones in 2026: GPTBot and OAI-SearchBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot (Perplexity), Google-Extended (Gemini training), Bytespider (TikTok / Doubao). They identify via user-agent strings and (mostly) honor robots.txt.
When to use
Audit your robots.txt against the current list of AI crawlers when AI search becomes a meaningful traffic source. Most sites accidentally over-block (or over-allow) because they last touched robots.txt before these bots existed.
See also
- robots.txt — Plain text file at /robots.txt that tells crawlers which paths they may or may not fetch.
- GPTBot — OpenAI's web crawler for future model training data — does not affect ChatGPT live retrieval.
- ClaudeBot — Anthropic's crawler for both Claude training data and Claude's live web-fetch tool.