Technical Guide — April 2026

AI Crawler robots.txt Guide
Which Bots to Allow

Your robots.txt file is no longer just about Googlebot. AI crawlers from OpenAI, Anthropic, Perplexity, and others are visiting your site daily. This guide covers every major AI bot and how to make the right allow/block decisions.

The New Crawler Landscape

In 2024, most websites had two or three search engine crawlers to think about: Googlebot, Bingbot, and maybe Yandex. By 2026, there are over a dozen AI-specific crawlers hitting websites regularly, each with different purposes and behaviors.

Some crawlers index content for training data. Others fetch content in real-time to answer user queries. Some do both. The distinction matters because blocking a training crawler is very different from blocking a query-time crawler. Block the wrong one and you disappear from AI-powered search results.

Your robots.txt is the first file every well-behaved crawler reads. If you have not updated it since the AI era began, you are either blocking traffic you want or allowing access you did not intend.

Major AI Crawlers Reference

Bot Name	Operator	Purpose	Recommendation
GPTBot	OpenAI	Training + browsing	Allow (critical for ChatGPT visibility)
ChatGPT-User	OpenAI	Real-time browsing only	Allow (live search queries)
ClaudeBot	Anthropic	Training data	Allow (improves Claude citations)
anthropic-ai	Anthropic	Product features	Allow
PerplexityBot	Perplexity	Real-time search answers	Allow (cited source traffic)
Google-Extended	Google	Gemini/Bard training	Allow (for AI Overview inclusion)
Amazonbot	Amazon	Alexa/product answers	Optional (depends on audience)
Meta-ExternalAgent	Meta	AI training	Optional (Meta AI features)
CCBot	Common Crawl	Open training datasets	Optional (feeds many AI models)

Recommended robots.txt for AI-Friendly Sites

Here is a robots.txt configuration that maximizes AI agent discoverability while protecting sensitive paths:

# Search engines
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

# AI Crawlers — Allow for maximum agent visibility
User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: anthropic-ai
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

# Protect sensitive paths from all crawlers
User-agent: *
Disallow: /admin/
Disallow: /api/internal/
Disallow: /dashboard/

# Point to sitemap and AI discovery files
Sitemap: https://example.com/sitemap.xml

Key principle: Be explicit. Do not rely on the wildcard User-agent: * rule for AI bots. Declare each bot you want to allow by name. This gives you granular control and makes your intent clear.

The Allow vs Block Decision Framework

The decision to allow or block an AI crawler depends on your business model and content strategy. Here is a framework:

Allow if: You want your content cited in AI-powered answers. You want agents to be able to complete tasks on your site (sign up, purchase, compare). You want to appear in AI search results from Perplexity, ChatGPT, and Google AI Overviews.

Consider blocking if: Your revenue depends entirely on page views (blocking training crawlers but allowing query-time crawlers can be a middle ground). You have premium content behind a paywall that you do not want summarized for free.

The reality: For most businesses, blocking AI crawlers is like blocking Googlebot in 2005. You are opting out of the primary way people will discover products and services going forward. The sites that allow AI crawlers now are building the same first-mover advantage that early SEO adopters captured.

Common robots.txt Mistakes

Blanket Disallow: /

A wildcard Disallow: / blocks ALL crawlers including AI bots. This is the nuclear option and almost never what you want.

Blocking GPTBot but allowing ChatGPT-User

GPTBot feeds ChatGPT's knowledge base. Blocking it means ChatGPT has outdated or no information about you, even if ChatGPT-User can browse.

No AI-specific rules at all

Relying on the wildcard rule means you cannot differentiate between AI bots. Add explicit rules for each major AI crawler.

Forgetting query-time crawlers

PerplexityBot and ChatGPT-User fetch content when users ask questions. Blocking them means you get zero AI search traffic.

robots.txt and Your AX Score

robots.txt is the first check in the Discoverability dimension (20% of AX score). If your robots.txt blocks major AI crawlers, your Discoverability score drops significantly — often by 8-10 points. The audit checks each major AI bot individually and reports which are allowed and which are blocked.

Check Your robots.txt With AX Audit

The AX Audit fetches your robots.txt, checks permissions for every major AI crawler, and tells you exactly which bots are allowed and which are blocked.

Check Your robots.txt With AX Audit

Free audit. No signup required.

AI Crawler robots.txt GuideWhich Bots to Allow

The New Crawler Landscape

Major AI Crawlers Reference

Recommended robots.txt for AI-Friendly Sites

The Allow vs Block Decision Framework

Common robots.txt Mistakes

Blanket Disallow: /

Blocking GPTBot but allowing ChatGPT-User

No AI-specific rules at all

Forgetting query-time crawlers

robots.txt and Your AX Score

Check Your robots.txt With AX Audit

AI Crawler robots.txt Guide
Which Bots to Allow