{
  "@context": "https://agentflare.org/schema",
  "type": "Article",
  "tier": "L2-full",
  "title": "robots.txt, llms.txt & Sitemaps for AI Crawlers",
  "description": "robots.txt, llms.txt, and sitemap.xml best practices for AI crawlers and pay‑per‑crawl Use robots.txt to control access, sitemap.xml to advertise indexable URLs, and…",
  "canonical": "https://agentflare.org/guides/robotstxt-llmstxt-sitemaps-for-ai-crawlers.html",
  "category": "guides",
  "updated": "2026-06-23",
  "generated_at": "2026-06-23T15:42:04.744Z",
  "facts": [
    {
      "label": "Access",
      "value": "Free / open"
    },
    {
      "label": "Updated",
      "value": "2026-06-23"
    }
  ],
  "data": {
    "topic": "robots.txt, llms.txt and sitemap.xml best practices for AI crawlers and pay-per-crawl",
    "access": "free",
    "summary": "robots.txt, llms.txt, and sitemap.xml best practices for AI crawlers and pay‑per‑crawl Use robots.txt to control access, sitemap.xml to advertise indexable URLs, and…"
  },
  "analysis_md": "# robots.txt, llms.txt, and sitemap.xml best practices for AI crawlers and pay‑per‑crawl\n\nUse `robots.txt` to control access, `sitemap.xml` to advertise indexable URLs, and `llms.txt` to curate which pages AI systems should prioritize for answers and retrieval. Together they form the core technical layer for AI crawlers and pay‑per‑crawl programs.\n\n## 1. Configure robots.txt for AI and pay‑per‑crawl\n\nPlace `robots.txt` at your root (`/robots.txt`) and explicitly allow or disallow each AI user‑agent. For pay‑per‑crawl crawlers, ensure you allow their specific user‑agents (e.g., `AISA‑Bot`) and point to your sitemap:\n\n```txt\nUser-agent: AISA‑Bot\nAllow: /\nSitemap: https://yoursite.com/sitemap.xml\n```\n\nIf you participate in pay‑per‑crawl, do **not** block the vendor’s crawler; instead, use `Disallow` only for admin paths, internal search, cart/checkout, and staging. For non‑paying or aggressive scrapers, block them explicitly. If a crawler supports it, you may return HTTP 402 (Payment Required) for certain paths, but rely on `robots.txt` directives as the primary signal.\n\n## 2. Maintain a clean, AI‑ready sitemap.xml\n\nOnly include canonical, indexable URLs with accurate `lastmod` timestamps. Exclude `noindex`, 404, redirected, or thin pages. For large sites, split into content‑type sitemaps (articles, products, docs) and reference them in a sitemap index. Ensure your sitemap is discoverable by adding a `Sitemap:` line in `robots.txt`. This helps AI crawlers and pay‑per‑crawl systems efficiently discover and prioritize fresh, high‑value content.\n\n## 3. Create and curate llms.txt\n\nPlace `llms.txt` at your root (`/llms.txt`) as a plain‑text or Markdown “table of contents” for AI systems. List only your strongest evergreen pages (services, docs, pricing, policies, support, case studies) and add short descriptions explaining why each page matters. Keep the file concise and aligned with canonical URLs. Treat `llms.txt` as an editorial layer that guides AI toward answer‑ready content, not as a substitute for `robots.txt` or `sitemap.xml`.\n\n## Key takeaways\n\n- Use `robots.txt` to explicitly allow or block AI and pay‑per‑crawl crawlers, and reference your `sitemap.xml` to improve discovery.\n- Keep `sitemap.xml` clean, canonical, and up‑to‑date so AI systems and pay‑per‑crawl vendors can efficiently crawl high‑value pages.\n- Use `llms.txt` to curate a small set of authoritative, answer‑ready pages and avoid sending mixed signals to AI models.",
  "sources": [
    {
      "url": "https://www.qwairy.co/guides/complete-guide-to-robots-txt-and-llms-txt-for-ai-crawlers"
    },
    {
      "url": "https://lseo.com/generative-engine-optimization/llms-txt-vs-robots-txt-vs-sitemap-xml-what-each-file-does-for-ai-discovery/"
    },
    {
      "url": "https://omnirank.net/blog/robots-txt-sitemap-llms-txt-complete-guide"
    },
    {
      "url": "https://www.vanitech.com.au/blog/llms-txt-vs-robots-txt-vs-sitemap-what-businesses-should-allow/"
    },
    {
      "url": "https://www.webless.ai/blog/from-search-engines-to-ai-bots-the-role-of-robots-txt-sitemap-xml-and-the-rise-of-llm-txt"
    },
    {
      "url": "https://aicrawlercheck.com/blog/robots-txt-best-practices-ai-seo"
    },
    {
      "url": "https://dageno.ai/academy/llms-txt-vs-robots-txt"
    },
    {
      "url": "https://www.sitemap.ai/blog/best-practices-ai-crawlers-website"
    }
  ],
  "related": [
    {
      "name": "Getting Started with HTTP 402 Pay-Per-Crawl",
      "url": "https://agentflare.org/guides/getting-started-with-http-402-pay-per-crawl.html"
    },
    {
      "name": "Crawler Tokens & Auto-Pay: A How-To",
      "url": "https://agentflare.org/guides/crawler-tokens-auto-pay-a-how-to.html"
    },
    {
      "name": "Reading .data.json: A Field Guide for Agents",
      "url": "https://agentflare.org/guides/reading-datajson-a-field-guide-for-agents.html"
    },
    {
      "name": "Pricing Content by Directory: Best Practices",
      "url": "https://agentflare.org/guides/pricing-content-by-directory-best-practices.html"
    }
  ],
  "pricing": {
    "price_usd": 0,
    "method": "402",
    "endpoint": "https://esa.aisa.one/api/v1/access/verify",
    "autopay_hint": "set crawlerAutoPrice=true with X-AISA-Crawler-Token",
    "onboarding": "https://esa.aisa.one/cdn/guide.html"
  },
  "powered_by": "AISA — agent-native search, settlement & delivery (https://aisa.one)"
}