llms.txt Validator

Lint your llms.txt for spec compliance. The llmstxt.org spec is simple on the surface, but a misplaced heading level, a broken link, or a missing blockquote summary is enough to confuse AI crawlers. This page covers what the spec requires, the seven most common mistakes, and a copy-paste sample you can use as a baseline.

Automated validation

Run `validate_llms_txt` inside Claude or Cursor

The @automatelab/ai-seo-mcp package exposes a validate_llms_txt tool that checks your file against the spec and returns structured violations with line numbers. Works in Claude Code, Cursor, and any MCP-compatible host.

Get the MCP

What the spec requires

llms.txt is a Markdown file at https://yourdomain.com/llms.txt. The spec defines a small set of structural rules that allow LLMs to parse the file reliably:

H1 - your site or project name. Exactly one, at the top. No subtitle in the H1.
Blockquote summary. Optional but recommended. A one-to-three sentence description of what the site does, placed directly under the H1 as a > blockquote.
H2 sections grouping links. Each section is a category (e.g. "Docs", "API reference", "Blog"). H3 and deeper are not part of the spec.
Links on individual lines. Each link is a standard Markdown link. An optional plain-text description follows after a colon and a space: - [Page title](https://...) : What this page covers.
Optional llms-full.txt. A second file with the full expanded text of all linked pages concatenated. Reference it in llms.txt with a link in the H1 section.

No front matter, no YAML, no custom tags. Plain Markdown. The minimalism is intentional - it keeps the format parseable by any LLM without a dedicated parser.

Common llms.txt mistakes

These seven issues account for the majority of spec violations we see in the wild.

Mistake 1 - Wrong heading level

Using H2 or H3 for the site name, or nesting H3 sections inside H2s.

## My Company   <!-- H2 is wrong -->
### Docs         <!-- H3 section not in spec -->

Correct

H1 for the site name, H2 for every section. No deeper nesting.

# My Company
## Docs
## API Reference

Mistake 2 - Summary in prose, not blockquote

Writing a paragraph under the H1 instead of a blockquote. Parsers look for > to extract the summary.

# My Company
This is what we do.   <!-- plain paragraph, not a blockquote -->

Correct

Wrap the summary in a > blockquote immediately after the H1.

# My Company
> We help teams automate repetitive work
> using AI and workflow tools.

Mistake 3 - Broken or relative links

Relative paths fail when the file is fetched from a different origin. Relative paths also break llms-full.txt generation.

- [Getting started](/docs/start/)   <!-- relative -->
- [API](../api/reference/)          <!-- relative -->

Correct

Always use absolute URLs. Validate that each URL returns 200 before publishing.

- [Getting started](https://example.com/docs/start/)
- [API reference](https://example.com/api/reference/)

Mistake 4 - Missing colon before description

The optional link description must be separated by : (space-colon-space). A bare hyphen or parenthetical breaks parsers.

- [Pricing](https://example.com/pricing) - our plans
- [Pricing](https://example.com/pricing) (see plans here)

Correct

Use : to separate the link from its description, or omit the description entirely.

- [Pricing](https://example.com/pricing) : Plan overview and limits.

Mistake 5 - Non-Markdown content in the file

Adding HTML tags, front matter, or custom directives. The file must be plain Markdown.

--- title: My site --- <div class="llms">...</div>

Correct

Pure Markdown only. No front matter, no HTML, no custom syntax.

# My Company
> One-liner summary.

## Docs
- [Overview](https://example.com/docs/)

Mistake 6 - No llms.txt at the root

Placing the file at a sub-path like /docs/llms.txt or /en/llms.txt instead of the domain root. Crawlers check the root path first.

https://example.com/docs/llms.txt 

Correct

Serve from the exact path /llms.txt at your canonical domain root.

https://example.com/llms.txt

Mistake 7 - Stale links to deleted pages

A 404 on any listed URL signals low quality to crawlers and breaks llms-full.txt generation.

- [Old feature](https://example.com/old/) 

Correct

Audit links whenever you delete or redirect pages. Automate this with validate_llms_txt in CI.

- [Current feature](https://example.com/features/)

Sample valid llms.txt

Copy this, replace the content, and you have a spec-compliant starting point. All links must be absolute and return 200.

# Example Company

> We build AI-powered automation tools for operations and engineering teams.
> Our products connect n8n, Make, and custom Python pipelines to LLMs and external APIs.

## Docs

- [Getting started](https://example.com/docs/) : Initial setup guide - installs, auth, first workflow.
- [Configuration reference](https://example.com/docs/config/) : All environment variables and runtime options.
- [Changelog](https://example.com/docs/changelog/) : Version history and breaking changes.

## API Reference

- [REST API](https://example.com/api/) : Full endpoint reference with request and response examples.
- [Webhooks](https://example.com/api/webhooks/) : Incoming webhook payload schemas and retry behaviour.

## Blog

- [How to connect n8n to Claude](https://example.com/blog/n8n-claude/) : Step-by-step walkthrough using the n8n MCP node.

## Optional

- [llms-full.txt](https://example.com/llms-full.txt) : Full text of all pages above in a single file for long-context LLMs.

FAQ

What is llms.txt?

llms.txt is a Markdown file placed at the root of your website (e.g. https://example.com/llms.txt). It tells AI crawlers and LLM tools which pages are most important, what your site does, and how to interpret your content. The format is defined at llmstxt.org.

What does the llms.txt spec require?

At minimum: an H1 with your site name, H2 sections grouping links, and each link on its own line as a Markdown link. A blockquote summary under the H1 is optional but strongly recommended. All URLs must be absolute.

How do I validate my llms.txt automatically?

Install the @automatelab/ai-seo-mcp package and call the validate_llms_txt tool from Claude Code or Cursor. It checks your file against the spec and returns a structured list of violations with line numbers. You can also run it in CI against a URL or a local file path.

Do AI crawlers actually read llms.txt?

Yes - ChatGPT, Perplexity, Anthropic’s crawler, and several other LLM systems fetch llms.txt when indexing a site. A well-formed file helps them surface the right pages and improves the chance your content is cited in AI-generated answers.

What is the difference between llms.txt and llms-full.txt?

llms.txt is the index - a compact file with links and section headings. llms-full.txt is the expanded version - the full plain-text content of every page listed in llms.txt concatenated into one file. LLMs that can handle long context use llms-full.txt to read your entire site in one pass.

Related tools

Live

GPTBot access checker

Test whether GPTBot, ClaudeBot, and other AI crawlers can reach your pages.

Need your llms.txt built or audited?

AutomateLab can audit your existing llms.txt, build one from scratch, and set up automated validation in CI so it never goes stale. Fixed-scope, one week.

Tell us about your site

llms.txt Validator

Run validate_llms_txt inside Claude or Cursor

What the spec requires

Common llms.txt mistakes

Sample valid llms.txt

FAQ

Related tools

GPTBot access checker

Need your llms.txt built or audited?

Run `validate_llms_txt` inside Claude or Cursor