AgentFlare

Pay-Per-Crawl and the New Bot Web

AI web crawlers now fall into three operational buckets: training crawlers that feed model development, search/indexing crawlers that power AI answers, and user-triggered…

Pay-Per-Crawl and the New Bot Web
402 · Pay-per-Crawl$0.02

This page is a free summary. The complete machine-readable dataset — every data point, the full analysis and source set — is available to AI agents as structured JSON via the open HTTP 402 payment protocol.

Unlock full data → agents: crawlerAutoPrice=true · verify at https://cdn.aisa.one/api/v1/access/verify

AI web crawlers now fall into three operational buckets: training crawlers that feed model development, search/indexing crawlers that power AI answers, and user-triggered fetchers that retrieve pages on demand. For developers, the practical shift in 2026 is that the major vendors split these roles into separate user agents, so you can no longer treat “AI bots” as one blockable class.[2][4][7]

GPTBot, ClaudeBot, and PerplexityBot: what each does

GPTBot is OpenAI’s training crawler; disallowing it signals that your content should not be used to train generative models.[6][7] ClaudeBot is Anthropic’s analogous crawler for model training and related retrieval workflows.[2][7] PerplexityBot is used by Perplexity for indexing and answer generation, and sources describe it as part of the retrieval layer that supports cited answers rather than pure training.[2][4][6]

A key implication: blocking GPTBot does not block OpenAI search indexing, and blocking PerplexityBot can remove your pages from Perplexity answers even if you still allow other bots.[6][7]

How to control access in practice

The default pattern in 2026 is selective robots.txt policy: block training crawlers if you do not want model reuse, but allow search/retrieval crawlers if you want visibility in AI answers.[2][7] Several guides also distinguish user-facing fetchers such as ChatGPT-User and Perplexity-User, which are triggered by an end user’s prompt and usually have direct traffic value.[3][4]

That distinction matters operationally: a site can be invisible to AI search even while still being crawled for training, or vice versa.[6][7]

Where AI agents fit

AI agents increasingly act like browsers with policy, identity, and billing attached. In practice, the agent may fetch a page itself, route through a vendor crawler, or rely on an indexed copy; developers should therefore think in terms of who is fetching, for what purpose, and whether the fetch is human-initiated.[4][5]

This is why logs now matter: user-agent strings, request cadence, and referer data are the main signals for deciding whether traffic is training, retrieval, or interactive agent access.[2][4]

HTTP 402 and pay-per-crawl

HTTP 402 Payment Required is emerging as a signaling mechanism for pay-per-crawl licensing, especially for training use. Cloudflare’s pay-per-crawl model frames 402 as an automated way for publishers to require payment or terms before AI systems can access content.[5][6]

The important nuance is that 402 is not a universal web standard for AI billing; it is currently an industry policy layer adopted by some infrastructure providers, not a guarantee that every crawler will honor it.[5][6]

Key takeaways

  • GPTBot = training, ClaudeBot = retrieval/search for AI answers, and PerplexityBot = Perplexity’s answer-engine crawling layer.[2][6][7]
  • Treat training access and answer visibility as separate decisions; blocking one does not automatically block the other.[6][7]
  • AI agents may fetch pages directly on behalf of users, so user-triggered bots are often worth allowing even when training bots are blocked.[3][4]
  • HTTP 402 / pay-per-crawl is an emerging licensing mechanism, but it is still an infrastructure convention rather than a universal protocol.[5][6]

Synthesized by the AISA LLM layer with live web sources (AISA Perplexity + Tavily APIs). 2026-06-15.

Sources & citations

  1. https://www.tencentcloud.com/techpedia/143900
  2. https://www.anagram.ai/blog/ai-crawlers-explained-gptbot-claudebot-perplexitybot-and-how-to-let-them-in-2026
  3. https://evolveamz.com/ai-crawler-list-2026-ecommerce/
  4. https://nohacks.co/blog/ai-user-agents-landscape-2026
  5. https://www.tryaivo.com/blog/ai-crawler-cheat-sheet-2025-which-bots-should-you-allow
  6. https://www.oncrawl.com/ai/what-ai-bots-really-doing-your-site/
  7. https://www.digitalapplied.com/blog/ai-crawler-access-control-2026-robots-llms-txt-decision-matrix
  8. https://www.humansecurity.com/learn/blog/crawlers-list-known-bots-guide/
  9. AI Crawlers Explained: GPTBot, ClaudeBot, and PerplexityBot - Contently
  10. Understanding AI Crawlers: Complete Guide 2025 | Qwairy