AI crawler access · robots.txt · llms.txt

Can AI actually
read your site?
Find out in 2 seconds.

Answer engines like ChatGPT, Perplexity and Google AI fetch your pages to cite them — unless your robots.txt quietly blocks them. aicrawlcheck tests every major AI crawler, validates your llms.txt and structured data, and shows exactly what to fix. Open methodology, no black-box score.

Try:

What it checks

The signals that decide if AI can cite you

AI crawler access

Allow/deny for GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, PerplexityBot, Google-Extended, CCBot, Bytespider and more — with a correct robots.txt matcher, not a naive substring check.

llms.txt

Checks you publish a valid /llms.txt (title, summary, link sections) — the emerging map that helps AI engines find your key pages.

On-page AI readiness

JSON-LD structured data, FAQ schema, a single clear H1, meta description, server-rendered content depth, and a freshness signal.

Open methodology

Every rule, in the open

No mystery score. Here is exactly how each check is decided — so you can verify and trust the result.

robots.txt matching: robots.txt is parsed per the common conventions: the most specific User-agent group wins (an exact, case-insensitive UA match overrides the "*" group), the longest matching path rule wins, Allow beats Disallow on ties, and "*"/"$" wildcards are supported. Access is evaluated for the path "/".

Frequently asked questions

Is aicrawlcheck free?

Yes — free, no account, no sign-up. Enter a URL and get an instant report. We never store the URLs you check.

What exactly does it check?

Whether the major AI crawlers (GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, PerplexityBot, Google-Extended, CCBot, Bytespider and more) are allowed or blocked in your robots.txt; whether you publish a valid llms.txt; and on-page AI-readiness signals — JSON-LD structured data, FAQ schema, a single H1, meta description, content depth and freshness.

Why does AI-crawler access matter?

Answer engines like ChatGPT Search, Perplexity and Google AI Overviews fetch your pages to cite them. If your robots.txt blocks their user-agents (often by accident, via a blanket Disallow or an over-eager “block AI bots” snippet), you silently disappear from AI answers. This tool catches that.

Should I block AI crawlers or allow them?

It depends on your goal. To be cited in AI answers, allow the answer-engine fetchers (OAI-SearchBot, PerplexityBot, ChatGPT-User, Google-Extended). Training-only scrapers (CCBot, Bytespider, Applebot-Extended) are a separate policy choice — blocking them is legitimate and the tool treats it as a choice, not an error.

What is llms.txt?

An emerging convention (llmstxt.org): a Markdown file at /llms.txt that gives AI engines a curated, clean map of your most important pages. aicrawlcheck checks it exists and follows the format (a title, a summary blockquote, and sections of links).

Is the score a real measurement?

There is no black-box “0–100 AI visibility” score here. We report concrete pass/warning/issue FACTS for each check, every rule is published in the Methodology section, and you get a downloadable result. We do not claim to predict whether a specific engine will cite you — that requires live monitoring, which is out of scope.

Does it run JavaScript-rendered pages?

No — it reads the server-rendered HTML, exactly like most AI crawlers do (they often do not execute JavaScript). If your key content only appears after client-side rendering, the tool will warn you, because that content is likely invisible to AI crawlers too.

Is my data safe? Any SSRF concerns?

The audit runs on Cloudflare and only fetches public http(s) URLs; requests to private, loopback, link-local and cloud-metadata addresses are blocked, redirects are re-validated, and responses are size- and time-capped. We keep no logs of what you check.