{
  "@context": "https://agentflare.org/schema",
  "type": "Article",
  "tier": "L2-full",
  "title": "Pay-Per-Crawl and the New Bot Web",
  "description": "AI web crawlers now fall into three operational buckets: training crawlers that feed model development, search/indexing crawlers that power AI answers, and user-triggered…",
  "canonical": "https://agentflare.org/research/pay-per-crawl-and-the-new-bot-web.html",
  "category": "research",
  "updated": "2026-06-15",
  "generated_at": "2026-06-15T01:19:16.018Z",
  "facts": [
    {
      "label": "Topic",
      "value": "discovery"
    },
    {
      "label": "Sources",
      "value": "10"
    },
    {
      "label": "Updated",
      "value": "2026-06-15"
    }
  ],
  "data": {
    "topic": "AI web crawlers GPTBot ClaudeBot PerplexityBot and pay per crawl",
    "cluster": "discovery",
    "summary": "AI web crawlers now fall into three operational buckets: training crawlers that feed model development, search/indexing crawlers that power AI answers, and user-triggered…"
  },
  "analysis_md": "AI web crawlers now fall into three operational buckets: **training crawlers** that feed model development, **search/indexing crawlers** that power AI answers, and **user-triggered fetchers** that retrieve pages on demand. For developers, the practical shift in 2026 is that the major vendors split these roles into separate user agents, so you can no longer treat “AI bots” as one blockable class.[2][4][7]\n\n## GPTBot, ClaudeBot, and PerplexityBot: what each does\n\n**GPTBot** is OpenAI’s training crawler; disallowing it signals that your content should not be used to train generative models.[6][7] **ClaudeBot** is Anthropic’s analogous crawler for model training and related retrieval workflows.[2][7] **PerplexityBot** is used by Perplexity for indexing and answer generation, and sources describe it as part of the retrieval layer that supports cited answers rather than pure training.[2][4][6]\n\nA key implication: blocking GPTBot does **not** block OpenAI search indexing, and blocking PerplexityBot can remove your pages from Perplexity answers even if you still allow other bots.[6][7]\n\n## How to control access in practice\n\nThe default pattern in 2026 is selective robots.txt policy: **block training crawlers** if you do not want model reuse, but **allow search/retrieval crawlers** if you want visibility in AI answers.[2][7] Several guides also distinguish *user-facing* fetchers such as ChatGPT-User and Perplexity-User, which are triggered by an end user’s prompt and usually have direct traffic value.[3][4]\n\nThat distinction matters operationally: a site can be invisible to AI search even while still being crawled for training, or vice versa.[6][7]\n\n## Where AI agents fit\n\nAI agents increasingly act like browsers with policy, identity, and billing attached. In practice, the agent may fetch a page itself, route through a vendor crawler, or rely on an indexed copy; developers should therefore think in terms of **who is fetching**, **for what purpose**, and **whether the fetch is human-initiated**.[4][5]\n\nThis is why logs now matter: user-agent strings, request cadence, and referer data are the main signals for deciding whether traffic is training, retrieval, or interactive agent access.[2][4]\n\n## HTTP 402 and pay-per-crawl\n\n**HTTP 402 Payment Required** is emerging as a signaling mechanism for **pay-per-crawl** licensing, especially for training use. Cloudflare’s pay-per-crawl model frames 402 as an automated way for publishers to require payment or terms before AI systems can access content.[5][6]\n\nThe important nuance is that 402 is not a universal web standard for AI billing; it is currently an industry policy layer adopted by some infrastructure providers, not a guarantee that every crawler will honor it.[5][6]\n\n## Key takeaways\n\n- **GPTBot = training**, **ClaudeBot = retrieval/search for AI answers**, and **PerplexityBot = Perplexity’s answer-engine crawling layer**.[2][6][7]\n- Treat **training access** and **answer visibility** as separate decisions; blocking one does not automatically block the other.[6][7]\n- **AI agents** may fetch pages directly on behalf of users, so user-triggered bots are often worth allowing even when training bots are blocked.[3][4]\n- **HTTP 402 / pay-per-crawl** is an emerging licensing mechanism, but it is still an infrastructure convention rather than a universal protocol.[5][6]",
  "sources": [
    {
      "url": "https://www.tencentcloud.com/techpedia/143900"
    },
    {
      "url": "https://www.anagram.ai/blog/ai-crawlers-explained-gptbot-claudebot-perplexitybot-and-how-to-let-them-in-2026"
    },
    {
      "url": "https://evolveamz.com/ai-crawler-list-2026-ecommerce/"
    },
    {
      "url": "https://nohacks.co/blog/ai-user-agents-landscape-2026"
    },
    {
      "url": "https://www.tryaivo.com/blog/ai-crawler-cheat-sheet-2025-which-bots-should-you-allow"
    },
    {
      "url": "https://www.oncrawl.com/ai/what-ai-bots-really-doing-your-site/"
    },
    {
      "url": "https://www.digitalapplied.com/blog/ai-crawler-access-control-2026-robots-llms-txt-decision-matrix"
    },
    {
      "url": "https://www.humansecurity.com/learn/blog/crawlers-list-known-bots-guide/"
    },
    {
      "title": "AI Crawlers Explained: GPTBot, ClaudeBot, and PerplexityBot - Contently",
      "url": "https://contently.com/2026/05/06/ai-crawlers-explained-gptbot-claudebot-perplexitybot"
    },
    {
      "title": "Understanding AI Crawlers: Complete Guide 2025 | Qwairy",
      "url": "https://www.qwairy.co/blog/understanding-ai-crawlers-complete-guide"
    }
  ],
  "related": [
    {
      "name": "Generative Engine Optimization (GEO): A Primer",
      "url": "https://agentflare.org/research/generative-engine-optimization-geo-a-primer.html"
    },
    {
      "name": "llms.txt: The Standard for AI-Readable Sites",
      "url": "https://agentflare.org/research/llmstxt-the-standard-for-ai-readable-sites.html"
    },
    {
      "name": "HTTP 402 & x402: How AI Agents Pay for Content",
      "url": "https://agentflare.org/research/http-402-x402-how-ai-agents-pay-for-content.html"
    },
    {
      "name": "The AI Agent Economy in 2026",
      "url": "https://agentflare.org/research/the-ai-agent-economy-in-2026.html"
    },
    {
      "name": "Model Context Protocol (MCP) Explained",
      "url": "https://agentflare.org/research/model-context-protocol-mcp-explained.html"
    },
    {
      "name": "Stablecoins as Rails for Autonomous Agents",
      "url": "https://agentflare.org/research/stablecoins-as-rails-for-autonomous-agents.html"
    }
  ],
  "pricing": {
    "price_usd": 0.02,
    "method": "402",
    "endpoint": "https://cdn.aisa.one/api/v1/access/verify",
    "autopay_hint": "set crawlerAutoPrice=true with X-AISA-Crawler-Token",
    "onboarding": "https://cdn.aisa.one/cdn/guide.html"
  },
  "powered_by": "AISA — agent-native search, settlement & delivery (https://aisa.one)"
}