GPTBot Access Check

Is your site visible to ChatGPT, Perplexity, and Claude? The answer lives in your robots.txt. Most sites either block all AI crawlers by accident or allow training crawlers while blocking the retrieval bots that actually feed live answers - so nothing shows up in AI search results. This page shows the correct user-agent strings and snippets for the six crawlers that matter.

Use this inside Claude or Cursor: the @automatelab/ai-seo-mcp includes a check_robots tool that fetches your robots.txt and reports which AI crawlers are blocked, which are allowed, and which are missing entirely - no copy-paste required.

What GPTBot and OAI-SearchBot actually do

OpenAI runs two separate crawlers and most site owners only know about one of them. GPTBot is the training crawler - it fetches pages that may feed future model training runs. OAI-SearchBot is the retrieval crawler - it powers the live web search that ChatGPT uses when answering questions in real time. If you block GPTBot but not OAI-SearchBot, your content will not end up in training data but may still appear in live ChatGPT answers. If you allow GPTBot but block OAI-SearchBot, the reverse is true. Most sites that want AI visibility need both allowed.

The same pattern holds across providers. Anthropic runs ClaudeBot for training and a separate agent for Claude.ai’s web-fetch tool. Perplexity runs PerplexityBot for its index. Google runs Google-Extended specifically for Gemini and Vertex AI training - it is separate from Googlebot and does not affect your regular search ranking. Each crawler reads only its own User-agent block, so a rule for one has no effect on the others.

The most common trap: a site adds User-agent: * / Disallow: / (intended to block one unwanted bot) and accidentally blocks every crawler not given an explicit Allow rule - including all six AI crawlers listed below. Check your robots.txt for a catch-all Disallow before assuming any individual bot is allowed.

The six crawlers to allow

Each card shows the user-agent string and the minimal robots.txt block to allow that crawler on your entire site.

OpenAI

GPTBot

Training crawler

Fetches content for OpenAI model training. Blocking this keeps your content out of future ChatGPT training datasets. Allow it if you want your site’s knowledge in the model.

robots.txt snippet
User-agent: GPTBot
Allow: /
OpenAI

OAI-SearchBot

Retrieval crawler (live answers)

Powers real-time web search in ChatGPT. This is the bot that determines whether your pages appear when someone asks ChatGPT a question and it searches the web for an answer.

robots.txt snippet
User-agent: OAI-SearchBot
Allow: /
OpenAI

ChatGPT-User

Live URL fetch

Sent when a ChatGPT user pastes a URL and asks ChatGPT to read or summarize it. Distinct from the background crawlers - this one fires on demand, triggered by a user action inside ChatGPT.

robots.txt snippet
User-agent: ChatGPT-User
Allow: /
Anthropic

ClaudeBot

Training and retrieval crawler

Anthropic’s web crawler, used for both training data collection and powering Claude’s web-fetch tool. Allowing it makes your content available when Claude searches the web during a conversation.

robots.txt snippet
User-agent: ClaudeBot
Allow: /
Perplexity

PerplexityBot

Index crawler

Perplexity AI’s crawler, indexing content for perplexity.ai answers and citations. Perplexity is citation-heavy - if your content is blocked here, it will not appear as a source in Perplexity responses.

robots.txt snippet
User-agent: PerplexityBot
Allow: /
Google

Google-Extended

Gemini and Vertex AI training

Google’s dedicated AI training crawler for Gemini and Vertex AI. Separate from Googlebot - blocking or allowing Google-Extended has no effect on your standard Google Search ranking.

robots.txt snippet
User-agent: Google-Extended
Allow: /

Allow all six at once

If you want to allow every AI crawler in a single block, add the following to your robots.txt. This is the minimum viable configuration for full AI visibility.

User-agent: GPTBot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

Place each block in robots.txt at your domain root. If you already have a User-agent: * block with a broad Disallow, add these blocks before it - robots.txt parsers use the most specific matching rule.

FAQ

Should I allow GPTBot on my site?

Yes, in most cases. Allowing GPTBot lets OpenAI use your content when training future models and when generating responses through ChatGPT’s browsing and search features. Blocking it removes your site from that citation surface. The exception is content you want to keep proprietary - paywalled research, private documentation, or competitive data you don’t want scraped.

What is the difference between GPTBot and OAI-SearchBot?

GPTBot is OpenAI’s training crawler - it fetches content that may feed future model training. OAI-SearchBot is the retrieval crawler - it powers the real-time web search ChatGPT uses when answering questions. You need both allowed if you want your site to appear in ChatGPT’s live answers. Blocking one while allowing the other is the most common AI-visibility misconfiguration.

Will allowing AI crawlers slow my site down?

No. AI crawlers follow Crawl-delay directives and are polite by design. GPTBot’s crawl rate is far lower than Googlebot. If you see unusual traffic from an AI crawler, add a Crawl-delay: 10 line to that crawler’s block in robots.txt.

Do I need a separate robots.txt rule for each AI crawler?

Yes. Each crawler reads only its own User-agent block. A rule for GPTBot does not apply to ClaudeBot or PerplexityBot. You need a separate User-agent entry for each crawler you want to configure - or a catch-all User-agent: * block that applies to all crawlers not explicitly named.

What is ChatGPT-User and how is it different from GPTBot?

ChatGPT-User fires when a user inside ChatGPT triggers a live web fetch - for example, pasting a URL and asking ChatGPT to summarize it. GPTBot is the background training crawler that runs continuously. Both are from OpenAI but they serve different purposes. You likely want both allowed.

Audit your full AI-search footprint

robots.txt is one signal. AI visibility also depends on your llms.txt file, FAQ schema markup, and how your content is structured for retrieval. The @automatelab/ai-seo-mcp runs a full audit inside Claude or Cursor - robots.txt, schema, llms.txt, and citation potential in one pass.