GPTBot Access Check
Is your site visible to ChatGPT, Perplexity, and Claude? The answer lives in your
robots.txt.
Most sites either block all AI crawlers by accident or allow training crawlers while
blocking the retrieval bots that actually feed live answers - so nothing shows up in
AI search results. This page shows the correct user-agent strings and snippets for
the six crawlers that matter.
Use this inside Claude or Cursor: the
@automatelab/ai-seo-mcp
includes a check_robots tool that
fetches your robots.txt and reports
which AI crawlers are blocked, which are allowed, and which are missing entirely - no
copy-paste required.
What GPTBot and OAI-SearchBot actually do
OpenAI runs two separate crawlers and most site owners only know about one of them. GPTBot is the training crawler - it fetches pages that may feed future model training runs. OAI-SearchBot is the retrieval crawler - it powers the live web search that ChatGPT uses when answering questions in real time. If you block GPTBot but not OAI-SearchBot, your content will not end up in training data but may still appear in live ChatGPT answers. If you allow GPTBot but block OAI-SearchBot, the reverse is true. Most sites that want AI visibility need both allowed.
The same pattern holds across providers. Anthropic runs ClaudeBot for training and
a separate agent for Claude.ai’s web-fetch tool. Perplexity runs PerplexityBot
for its index. Google runs Google-Extended specifically for Gemini and Vertex AI
training - it is separate from Googlebot and does not affect your regular search ranking.
Each crawler reads only its own User-agent
block, so a rule for one has no effect on the others.
The most common trap: a site adds User-agent: * / Disallow: /
(intended to block one unwanted bot) and accidentally blocks every crawler not given an
explicit Allow rule - including all six AI crawlers listed below. Check your
robots.txt for a
catch-all Disallow before assuming any individual bot is allowed.
The six crawlers to allow
Each card shows the user-agent string and the minimal robots.txt block to allow that crawler on your entire site.
GPTBot
Fetches content for OpenAI model training. Blocking this keeps your content out of future ChatGPT training datasets. Allow it if you want your site’s knowledge in the model.
User-agent: GPTBot Allow: /
OAI-SearchBot
Powers real-time web search in ChatGPT. This is the bot that determines whether your pages appear when someone asks ChatGPT a question and it searches the web for an answer.
User-agent: OAI-SearchBot Allow: /
ChatGPT-User
Sent when a ChatGPT user pastes a URL and asks ChatGPT to read or summarize it. Distinct from the background crawlers - this one fires on demand, triggered by a user action inside ChatGPT.
User-agent: ChatGPT-User Allow: /
ClaudeBot
Anthropic’s web crawler, used for both training data collection and powering Claude’s web-fetch tool. Allowing it makes your content available when Claude searches the web during a conversation.
User-agent: ClaudeBot Allow: /
PerplexityBot
Perplexity AI’s crawler, indexing content for perplexity.ai answers and citations. Perplexity is citation-heavy - if your content is blocked here, it will not appear as a source in Perplexity responses.
User-agent: PerplexityBot Allow: /
Google-Extended
Google’s dedicated AI training crawler for Gemini and Vertex AI. Separate from Googlebot - blocking or allowing Google-Extended has no effect on your standard Google Search ranking.
User-agent: Google-Extended Allow: /
Allow all six at once
If you want to allow every AI crawler in a single block, add the following to your
robots.txt. This
is the minimum viable configuration for full AI visibility.
User-agent: GPTBot Allow: / User-agent: OAI-SearchBot Allow: / User-agent: ChatGPT-User Allow: / User-agent: ClaudeBot Allow: / User-agent: PerplexityBot Allow: / User-agent: Google-Extended Allow: /
Place each block in robots.txt at your
domain root. If you already have a User-agent: *
block with a broad Disallow, add these blocks before it - robots.txt
parsers use the most specific matching rule.
FAQ
Should I allow GPTBot on my site?
Yes, in most cases. Allowing GPTBot lets OpenAI use your content when training future models and when generating responses through ChatGPT’s browsing and search features. Blocking it removes your site from that citation surface. The exception is content you want to keep proprietary - paywalled research, private documentation, or competitive data you don’t want scraped.
What is the difference between GPTBot and OAI-SearchBot?
GPTBot is OpenAI’s training crawler - it fetches content that may feed future model training. OAI-SearchBot is the retrieval crawler - it powers the real-time web search ChatGPT uses when answering questions. You need both allowed if you want your site to appear in ChatGPT’s live answers. Blocking one while allowing the other is the most common AI-visibility misconfiguration.
Will allowing AI crawlers slow my site down?
No. AI crawlers follow Crawl-delay directives and are polite by design. GPTBot’s
crawl rate is far lower than Googlebot. If you see unusual traffic from an AI
crawler, add a Crawl-delay: 10
line to that crawler’s block in robots.txt.
Do I need a separate robots.txt rule for each AI crawler?
Yes. Each crawler reads only its own User-agent block. A rule for GPTBot does not
apply to ClaudeBot or PerplexityBot. You need a separate User-agent entry for
each crawler you want to configure - or a catch-all User-agent: *
block that applies to all crawlers not explicitly named.
What is ChatGPT-User and how is it different from GPTBot?
ChatGPT-User fires when a user inside ChatGPT triggers a live web fetch - for example, pasting a URL and asking ChatGPT to summarize it. GPTBot is the background training crawler that runs continuously. Both are from OpenAI but they serve different purposes. You likely want both allowed.
Audit your full AI-search footprint
robots.txt is one signal. AI visibility also depends on your llms.txt file, FAQ schema markup, and how your content is structured for retrieval. The @automatelab/ai-seo-mcp runs a full audit inside Claude or Cursor - robots.txt, schema, llms.txt, and citation potential in one pass.