AI crawler access · robots.txt · llms.txt
Can AI actually
read your site?
Find out in 2 seconds.
Answer engines like ChatGPT, Perplexity and Google AI fetch your pages to cite them — unless your
robots.txt quietly blocks them. aicrawlcheck tests every major AI crawler, validates
your llms.txt and structured data, and shows exactly what to fix. Open methodology, no black-box score.
What it checks
The signals that decide if AI can cite you
AI crawler access
Allow/deny for GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, PerplexityBot, Google-Extended, CCBot, Bytespider and more — with a correct robots.txt matcher, not a naive substring check.
llms.txt
Checks you publish a valid /llms.txt (title, summary, link sections) — the emerging map that helps AI engines find your key pages.
On-page AI readiness
JSON-LD structured data, FAQ schema, a single clear H1, meta description, server-rendered content depth, and a freshness signal.
Open methodology
Every rule, in the open
No mystery score. Here is exactly how each check is decided — so you can verify and trust the result.
- AI answer engines can access the pagePASS if none of OAI-SearchBot, ChatGPT-User, PerplexityBot, Perplexity-User, Claude-User, Google-Extended is Disallowed for "/" in robots.txt; FAIL if any is blocked.
- AI training-scraper policyINFO only — lists which training scrapers (CCBot, Bytespider, Applebot-Extended, …) are blocked. Blocking is a valid choice, never counted against you.
- llms.txt present and validPASS if /llms.txt exists with a "# Title", a "> summary" blockquote and ≥1 Markdown link; WARN if present-but-incomplete or missing.
- Structured data (JSON-LD)PASS if ≥1 application/ld+json block is found; WARN if none.
- FAQ / Q&A schemaPASS if FAQPage/QAPage/Question markup is present; INFO (suggestion) if absent.
- Exactly one H1PASS if the HTML has exactly one <h1>; WARN for zero or multiple.
- Meta descriptionPASS if a non-empty <meta name="description"> exists; WARN if absent.
- Substantive contentPASS if ≥250 words of server-rendered text; WARN below that (likely thin or JS-only).
- Freshness signalPASS if schema dateModified or a Last-Modified header is present; INFO if neither.
robots.txt matching: robots.txt is parsed per the common conventions: the most specific User-agent group wins (an exact, case-insensitive UA match overrides the "*" group), the longest matching path rule wins, Allow beats Disallow on ties, and "*"/"$" wildcards are supported. Access is evaluated for the path "/".
Frequently asked questions
Is aicrawlcheck free?
Yes — free, no account, no sign-up. Enter a URL and get an instant report. We never store the URLs you check.
What exactly does it check?
Whether the major AI crawlers (GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, PerplexityBot, Google-Extended, CCBot, Bytespider and more) are allowed or blocked in your robots.txt; whether you publish a valid llms.txt; and on-page AI-readiness signals — JSON-LD structured data, FAQ schema, a single H1, meta description, content depth and freshness.
Why does AI-crawler access matter?
Answer engines like ChatGPT Search, Perplexity and Google AI Overviews fetch your pages to cite them. If your robots.txt blocks their user-agents (often by accident, via a blanket Disallow or an over-eager “block AI bots” snippet), you silently disappear from AI answers. This tool catches that.
Should I block AI crawlers or allow them?
It depends on your goal. To be cited in AI answers, allow the answer-engine fetchers (OAI-SearchBot, PerplexityBot, ChatGPT-User, Google-Extended). Training-only scrapers (CCBot, Bytespider, Applebot-Extended) are a separate policy choice — blocking them is legitimate and the tool treats it as a choice, not an error.
What is llms.txt?
An emerging convention (llmstxt.org): a Markdown file at /llms.txt that gives AI engines a curated, clean map of your most important pages. aicrawlcheck checks it exists and follows the format (a title, a summary blockquote, and sections of links).
Is the score a real measurement?
There is no black-box “0–100 AI visibility” score here. We report concrete pass/warning/issue FACTS for each check, every rule is published in the Methodology section, and you get a downloadable result. We do not claim to predict whether a specific engine will cite you — that requires live monitoring, which is out of scope.
Does it run JavaScript-rendered pages?
No — it reads the server-rendered HTML, exactly like most AI crawlers do (they often do not execute JavaScript). If your key content only appears after client-side rendering, the tool will warn you, because that content is likely invisible to AI crawlers too.
Is my data safe? Any SSRF concerns?
The audit runs on Cloudflare and only fetches public http(s) URLs; requests to private, loopback, link-local and cloud-metadata addresses are blocked, redirects are re-validated, and responses are size- and time-capped. We keep no logs of what you check.