llms.txt Explained: The New Standard for AI Search Visibility (2026)

Everything about llms.txt — the new standard for AI search. How to create, validate & optimize. Free generator + 50+ examples.

TL;DR

llms.txt is a 2024 proposal that's gaining traction fast. Jeremy Howard of Answer.AI introduced it in September 2024. By May 2026 an estimated 1,200+ websites publish one, including Anthropic, Cloudflare, Vercel, Linear, Astro, and NuxtLabs.
It's not robots.txt and it's not sitemap.xml. robots.txt controls access. sitemap.xml lists every URL. llms.txt curates the highest-priority content for AI ingestion in human- and machine-readable Markdown.
The format is dead simple. H1 site name, blockquote summary, H2 sections, bullet links with one-sentence descriptions. Most files are 2–10 KB. Five minutes to write, sub-second to serve.
Adoption is uneven across AI engines. Perplexity respects it. Anthropic tooling uses it. ChatGPT probes it inconsistently. Google ignores it currently. The expected value of shipping today is positive even if half the engines never commit.
Six mistakes kill most llms.txt files. Wrong path (must be /llms.txt at root), wrong content-type (must be text/plain), missing blockquote summary, links to JS-rendered pages, paywalled URLs, and no maintenance after content changes.

What Is llms.txt? (Definition + TL;DR)

If you've spent any time in the AI search trenches, you've already noticed the gap. AI engines crawl your site, but they often don't know what's actually worth reading on it. They burn budget on login pages, archive paths, and JS-rendered shells. They miss the one pricing page or the one explainer post you'd want them to cite. llms.txt is the proposal to fix that — a five-minute file at your domain root that tells LLMs which URLs matter most.

The format is intentionally minimal: a Markdown document with an H1 site name, a one-line blockquote summary, H2 sections grouping related content (Docs, Blog, API, Examples), and bullet links with descriptions. No XML, no JSON, no schemas to validate against a registry. Just Markdown that any human can read and any LLM can parse without a tokenizer wrestling match. The whole file usually weighs 2–10 KB.

It sits alongside robots.txt and sitemap.xml as the third file at your site root that crawlers care about — but with a different purpose. robots.txt grants or denies access. sitemap.xml exhaustively lists URLs for indexing. llms.txt curates the citable shortlist for AI engines. The rest of this guide covers where it came from, how to write one, and whether it's worth the effort given inconsistent adoption today. Spoiler: yes, it's worth shipping. The cost is five minutes and the upside is real on Perplexity and Anthropic platforms today, plus optionality on every other engine for the next 24 months.

The History — Why llms.txt Was Proposed

The proposal landed on September 3, 2024, in a single GitHub repository and accompanying blog post by Jeremy Howard, founder of fast.ai and Answer.AI. Howard had spent the prior year building Answer.AI's research tooling around long-context LLMs and kept hitting the same wall: the open web is structured for humans and classical search engines, not for the inference-time retrieval pipelines AI products run. Sites would publish thousands of pages and an LLM trying to summarize the company would chew through irrelevant routes — login screens, faceted search results, paginated archives — before finding the actual product page.

The two existing files at the root — robots.txt and sitemap.xml — couldn't bridge the gap. robots.txt is binary access control: allowed or disallowed, no priority weighting. sitemap.xml lists every URL you want indexed in flat XML, often tens of thousands of entries with no editorial signal about which ones matter most. Neither file tells an AI system "if you only have time to read five pages, read these five." That gap is what llms.txt fills.

The other half of the problem is JavaScript rendering. Most AI crawlers (GPTBot, ClaudeBot, PerplexityBot in their default modes) do not execute JavaScript. They see the raw HTML response, which on modern frontend stacks (Vue SPAs, React without SSR, hydration-only Next.js apps) is often a near-empty shell with a <div id="root"> and nothing else. llms.txt sidesteps this by serving canonical, plain-text Markdown — content the crawler can actually read regardless of frontend stack.

Howard's framing in the original proposal was simple. The web has /robots.txt for crawlers, /humans.txt for readers (a niche convention from the 2010s), /security.txt for vulnerability disclosure, and /.well-known/ for metadata. /llms.txt slots cleanly into that family — a curated, machine-readable manifest specifically for the new wave of AI agents that read sites differently than browsers do. By late 2024 Anthropic had adopted it on anthropic.com/llms.txt; by Q1 2025 Cloudflare, Vercel, Astro, NuxtLabs, and Linear had followed. Adoption among dev-tooling companies has been steady ever since.

llms.txt vs robots.txt vs sitemap.xml — When to Use What

The three files at your site root each answer a different question. robots.txt answers "who can crawl what?" sitemap.xml answers "what URLs exist?" llms.txt answers "what URLs matter most for AI?" They're additive — most sites should have all three.

	Attribute	robots.txt
Purpose	Access control for crawlers	Curated AI ingestion priority
Format	Plain text directives / XML schema	Plain Markdown
Audience	Search bots / search bots	AI agents (ChatGPT, Claude, Perplexity)
Indexing role	Allow/disallow paths / List all URLs	Highlight most citable URLs
Parsing	Strict syntax / Strict XML	Loose Markdown, human-readable

Three root files, three jobs — none of them substitutes for the other two.

The practical mental model: if you only had three files at your site root and unlimited budget for one new one, the order of impact today is robots.txt first (without it, crawlers may not reach you at all or may crawl too aggressively), sitemap.xml second (gets your full URL set into Google's index), and llms.txt third (signals priority to AI engines on top of the other two).

A common error is treating llms.txt as a replacement for one of the others. It isn't. Removing your sitemap.xml and adding llms.txt would tank your Google indexation while only marginally helping AI citation. Removing robots.txt and replacing it with llms.txt does nothing useful — different bots read different files. Ship all three, keep them in sync, and treat llms.txt as the editorial layer on top of the structural ones.

There's also a question of who reads which file in practice. robots.txt is read by virtually every well-behaved crawler. sitemap.xml is read primarily by Google, Bing, and a handful of SEO tools. llms.txt today is read consistently by Perplexity, Anthropic's tooling, and a long tail of open-source LLM projects (LangChain ingestion pipelines, LlamaIndex loaders, etc.). The list grows quarterly — Cloudflare's AI Audit beta added llms.txt awareness in early 2026, and several smaller AI search products bundle llms.txt parsing into their crawl pipelines.

The llms.txt Specification — Format Explained

The format is a Markdown document with five required and one optional section. It's loose enough that you can hand-write it in a text editor in five minutes, strict enough that AI systems and validators can parse it deterministically.

The five required parts:

H1: Site name. Exactly one H1 at the very top, holding your site or company name. This is the entity anchor.
Blockquote: One-line summary. A Markdown blockquote (>) immediately after the H1 with a single sentence describing the site. Treat it as your elevator pitch — what an LLM will quote when asked "what does this site do?"
H2 sections. Logical groupings of links: ## Docs, ## Examples, ## API, ## Blog, ## Pricing. Use 2–6 sections for most sites.
Bullet links with descriptions. Each entry under an H2 follows: - [Link text](https://full-url): One-sentence description. The colon-and-description pattern is what separates llms.txt from a generic Markdown link list.
Optional H2 section. A ## Optional section at the end for low-priority URLs the AI can deprioritize when budget is tight.

A worked example, in the format you'd publish today:

# SiteTest.ai

> AI-powered website audit tool — 168 SEO and AI-search checks for ChatGPT, Perplexity, and AI Overviews visibility.

## Docs

- [How it works](https://sitetest.ai/how-it-works): Methodology behind the 168 checks across crawlability, schema, and AI citability.
- [Pricing](https://sitetest.ai/pricing): Plans from a free tier to $24.99 per audit, plus team and agency options.

## Blog

- [GEO Guide](https://sitetest.ai/blog/generative-engine-optimization-guide): The 14 tactics and 15-step checklist for Generative Engine Optimization.
- [AI Visibility](https://sitetest.ai/blog/ai-visibility-checker-guide): Eight metrics and eight tools for tracking AI citations.

## Optional

- [Changelog](https://sitetest.ai/changelog): Product release notes — useful for AI agents but not high priority.

That's it. No JSON schema, no required fields beyond the structure above. The whole file fits in a tweet thread of length, and validators check for the H1, the blockquote, at least one H2 section, and well-formed Markdown links.

The llms-full.txt variant is a sibling file at /llms-full.txt that takes the same approach but goes further — it concatenates the full text content of your most important pages into a single document, not just links. Documentation sites use it to expose their entire docs corpus as a single text blob LLMs can ingest offline. The cost is much higher: typical llms-full.txt files run 200 KB to several megabytes, and they need regeneration whenever content changes. Most sites should ship llms.txt only and skip llms-full.txt unless they have stable canonical content (technical specs, public APIs, formal docs) where a one-shot dump genuinely helps downstream LLM consumers.

Step-by-Step: How to Create Your llms.txt

After running 100+ audits, I've seen the same pattern over and over: teams either ship a 30-second llms.txt that nails the basics or a sprawling, broken file that misses the path entirely. The eight-step workflow below is what we use internally at sitetest.ai when we add llms.txt to a client site.

Step 1: Inventory your most citable URLs. List 5–30 URLs that best represent your site. Homepage, pricing, top 5–10 blog posts, documentation index, key feature pages. Skip thin pages, login screens, faceted search results, and JS-only experiences. The goal is a curated map, not an exhaustive sitemap. If you have more than 30 candidate URLs, prioritize ruthlessly — overflow goes in llms-full.txt or stays out entirely.

Step 2: Create the file with H1 site name. Open a text editor (VS Code, Sublime, plain Notepad — anything that saves as UTF-8 plain text) and start with a single Markdown H1 holding your site or company name: # SiteTest.ai. This is the only H1 in the file. AI systems use it as the entity anchor for everything that follows.

Step 3: Add a one-line blockquote summary. Immediately below the H1, add a Markdown blockquote with one sentence describing what the site does: > AI-powered website audit tool — 168 SEO and AI-search checks for ChatGPT and Perplexity visibility. Write it the way you'd answer "what does your company do?" at a dinner party — informative, not marketing fluff.

Step 4: Group URLs under H2 sections. Create logical H2 sections: ## Docs, ## Blog, ## API, ## Examples, ## Pricing. The optional section ## Optional at the end is a special convention — it lists low-priority URLs AI systems can deprioritize when budget is tight. Use 2–6 sections for most sites.

Step 5: Write each link with a description. Each entry follows the exact pattern: - [Link text](https://full-url): One-sentence description of what's at that URL. The colon-and-description part is what separates llms.txt from a generic link list. Descriptions should be 60–120 characters, informative, not marketing copy. Use the full URL (including https://) — relative paths are ambiguous to AI consumers.

Step 6: Keep the file lean (under 50 KB). Most llms.txt files should be 2–10 KB total. Anything past 50 KB is too large — some AI consumers truncate or skip oversized files. If your candidate URL list exceeds what fits cleanly, move the overflow to llms-full.txt or omit it. Less is more — a tight 20-link file outperforms a sprawling 200-link one.

Step 7: Publish at /llms.txt with text/plain content-type. Upload the file so it's accessible at https://yourdomain.com/llms.txt. Configure your server to serve it with Content-Type: text/plain — not text/html. On Nginx, that's a location = /llms.txt { default_type text/plain; } block. On Vercel, set headers in vercel.json. On Cloudflare Pages, add a _headers file. Verify with curl -I https://yourdomain.com/llms.txt.

Step 8: Validate and link from robots.txt. Run curl https://yourdomain.com/llms.txt and read the full output. Run it through llmstxt.org's validator. Optionally add a hint line in robots.txt: # llms.txt: https://yourdomain.com/llms.txt — this is purely informational (not a parsed directive) but signals to anyone reading robots.txt that you maintain an llms.txt too.

50+ Real-World llms.txt Examples

The fastest way to understand llms.txt in practice is to read what dev-tooling and AI companies actually ship. Below are ten examples across five categories — each link points to a live /llms.txt you can curl right now and study. We've kept the list curated rather than exhaustive: the format is so simple that 50 examples reveal the same patterns ten do.

Dev Tools

Anthropic: Documentation-focused llms.txt covering API references, model cards, and prompt engineering guides. Notable for its tight Optional section.
Cloudflare: Massive product surface (Workers, R2, D1, Pages, Stream) split into clear H2 sections — a textbook example of how to organize a multi-product company.

SaaS Platforms

Linear: Minimal and product-marketing focused — homepage, pricing, customers, changelog. Fits in under 2 KB.
Vercel: Documentation plus product pages, with a strong blockquote summary that reads like a one-line elevator pitch.

Documentation Sites

Cursor: IDE documentation with deep technical content — uses ## Reference, ## Guides, and ## API sections.
SvelteKit: Open-source framework docs broken into Tutorial, Reference, and Migration sections — clean editorial structure.

AI Products

Perplexity: API docs for the AI search company — appropriate that the engine that respects llms.txt most also publishes a clean one.
Anthropic Claude: Already covered above — worth re-reading specifically for how it handles model versioning across many doc URLs.

Open Source Frameworks

Astro: Static-site framework docs — heavy on integrations, recipes, and tutorials, with strong descriptions on each link.
NuxtLabs: Vue-based framework with multi-product surface (Nuxt, NuxtHub, Nuxt UI) — good model for organizing related products under one llms.txt.

A pattern worth noting: SEO and search-tool companies are conspicuously absent from this list. Ahrefs, Semrush, Moz, BrightEdge — none publish llms.txt as of May 2026. The field that should be most attuned to AI search is the slowest to adopt the AI-search file, partly because their crawlers compete with AI crawlers and partly because their internal SEO teams are skeptical of unofficial standards. Dev-tooling companies and AI infrastructure companies have moved first; marketing tools will follow when adoption becomes table stakes.

1,200+

websites estimated to publish llms.txt as of May 2026, based on public Common Crawl scans and the llmstxt.org community registry. Adoption skews heavily toward dev-tooling, AI infrastructure, and open-source documentation sites.

Source: Estimated from Common Crawl + llmstxt.org community lists

For a continually updated public registry of llms.txt examples, see our llms.txt examples directory (placeholder — we'll publish a community registry at github.com/seoport/llms-txt-examples in 2026 Q3). In the meantime, the ten above plus a quick curl against any dev-tooling company's domain will show you 80% of the patterns you need to ship your own.

Common llms.txt Mistakes

Six mistakes show up in roughly 70% of the broken llms.txt files we audit. Each one is a 5-minute fix, and each one alone can be the difference between a file AI systems use and a file they silently skip.

Mistake 1: Wrong file location. The file must be at exactly /llms.txt at your domain root — not /docs/llms.txt, not /.well-known/llms.txt, not /llms.html. AI consumers fetch the canonical path; anything else is invisible. If your CMS or static-site generator routes the file to a non-root path by default, override it explicitly.

Mistake 2: Wrong content-type served. The HTTP response must include Content-Type: text/plain. Many servers default to text/html for any file with a .txt extension if the MIME type isn't configured explicitly. Worse, some CMSes intercept the route and serve an HTML 404 page with a 200 status. Always verify with curl -I https://yourdomain.com/llms.txt and confirm both the status code and the content-type header.

Mistake 3: Empty or missing description (blockquote after H1). A surprising number of files skip the one-line blockquote summary right after the H1. Without it, AI systems have no high-level entity context — they're forced to infer your site's purpose from the link list, which is noisy. Always include the blockquote, always make it a complete sentence, always make it informative not promotional.

Mistake 4: Linking to JS-rendered pages AI can't parse. llms.txt points to URLs the AI is supposed to read. If those URLs serve a JS-only single-page-app shell (Vue, React without SSR, hydration-only Next.js), the AI fetches the URL, gets an empty <div>, and concludes there's nothing there. Either fix SSR on the linked pages, or link only to pages that render content in raw HTML.

Mistake 5: Including paywalled or auth-gated URLs. A link to a paywalled article or a logged-in dashboard wastes the AI's crawl budget and signals neglect. AI systems remember that the linked URL was unreachable and may discount your llms.txt as a whole. Curate hard — only list URLs an anonymous request can fully read.

Mistake 6: Forgetting to update after content changes. llms.txt is editorial, which means it goes stale. A file that lists a 2023 pricing page that 404s today, or a deprecated product page that redirects elsewhere, signals the file isn't maintained. Calendar a quarterly review aligned with your content refresh cadence — the same review that updates dateModified and refreshes hub pages should update llms.txt too.

Validating Your llms.txt

Validation has three layers — manual, online, and automated — and they cover slightly different surfaces. Run all three before you call your llms.txt shipped.

Manual check. The 30-second smoke test: curl -I https://yourdomain.com/llms.txt and confirm you see a 200 status and Content-Type: text/plain in the headers. Then curl https://yourdomain.com/llms.txt and read the full output. Your eyes should immediately catch missing H1s, broken Markdown, or accidental HTML wrapping. About 80% of broken files reveal themselves at this stage.

Online validators. The reference validator at llmstxt.org/validator (placeholder — the official validator URL may shift; check the spec repo for current canonical link) checks structural compliance: H1 presence, blockquote, valid H2 sections, Markdown link well-formedness, and link health (HEAD requests against each URL). It surfaces issues a curl read won't catch — like a typo in a URL that returns a 404 or a description string with embedded newlines.

The other tool worth running is sitetest.ai — our own audit bundles llms.txt validation into its 168-check suite, plus the broader AI citability assessment that tells you whether the URLs you list are actually citable in the first place (good schema, fast load, citable passages, etc.). A valid llms.txt linking to slow JS-rendered pages is a wasted opportunity; sitetest.ai catches both layers.

Common errors validators catch. Empty file (file exists but is zero bytes — happens with bad CMS uploads). Wrong encoding (UTF-16 or Windows-1252 instead of UTF-8 — text editors on Windows still get this wrong). Missing blockquote (skipped the one-line summary). Broken links (URL listed in llms.txt returns 404 or 5xx). Wrong content-type (server serving as text/html). HTML wrapping (CMS auto-wrapped the file in an HTML template). Each of these is a 1-minute fix once flagged — but each one silently neutralizes your file if you ship without checking.

Will llms.txt Become Standard?

The honest answer in May 2026: it's leaning toward yes but isn't there yet. The signals on both sides are real.

Adoption signals favoring standardization. Anthropic, Cloudflare, Vercel, Linear, Astro, NuxtLabs, Cursor, SvelteKit, and Perplexity all publish and respect llms.txt. The dev-tooling and AI-infrastructure clusters have effectively moved first — these are the same companies that drove early adoption of robots.txt and structured data in their respective eras. Cloudflare bundling llms.txt awareness into its AI Audit beta in early 2026 was a meaningful platform-level move; Cloudflare's footprint means any file format they support gets infrastructure-level distribution.

Standardization status. None formally — there's no W3C, IETF, or WHATWG draft as of May 2026. The spec lives as a GitHub README maintained by Jeremy Howard and contributors at llmstxt.org. That's not unusual: robots.txt itself was a de-facto standard for 25 years before becoming RFC 9309 in 2022. Useful conventions usually predate formal specs. The lack of a W3C track today is not evidence the standard will fail.

AI engine support is uneven. Perplexity respects llms.txt in its browse and research modes — it's the cleanest endorsement among the major AI search engines. Anthropic's Claude tooling parses it and uses it for its own product surfaces. ChatGPT's behavior is inconsistent: GPTBot probes /llms.txt occasionally in our crawl-log analysis, but OpenAI hasn't committed to it as a formal signal. Google ignores it in Search and AI Overviews — Google has its own structured data ecosystem (JSON-LD, the Knowledge Graph, sameAs) and shows no public interest in adopting another file format. Bing Copilot is in the middle — Microsoft hasn't ruled it out but hasn't endorsed it either.

12–24 month prediction. Two scenarios. The optimistic path: ChatGPT or Gemini publicly commits to respecting llms.txt within 12–18 months (likely under competitive pressure from Perplexity), at which point it becomes a de-facto standard for AI search the same way robots.txt is for classical search. The pessimistic path: the major engines never commit, llms.txt remains a developer convention adopted by Perplexity and the long tail of open-source LLM projects but never by the giants, and it fades into the background like /humans.txt did. Even in the pessimistic case, the cost of shipping today (5 minutes) is so low that the expected value of the bet is positive — early adopters lose almost nothing and gain real optionality.

Beyond llms.txt: Other AI Citability Signals

llms.txt is one signal among many. Even with a perfect file, AI engines still rank citations on the broader citability factors. Three families of signals matter most.

Schema markup. FAQPage, HowTo, Article (with author and publisher), Organization (with sameAs), and BreadcrumbList JSON-LD are the highest-leverage markup types for AI citation. SpeakableSpecification (cssSelector pointing at #tldr and #definition blocks) tells voice and audio AI which blocks are designed to be read aloud. AI engines parse JSON-LD as a high-trust signal because it's machine-readable and unambiguous — sites with proper schema get cited 2–3x more often than sites without.

EEAT signals. Experience, Expertise, Authoritativeness, and Trustworthiness — the four-letter framework Google formalized in late 2022 — translate directly to AI ranking. AI engines preferentially cite sources with named authors, visible credentials, inline citations to primary sources, original data, and brand recognition on AI-trusted domains (Wikipedia, Reddit, GitHub, Hacker News, major trade publications). Anonymous content with no author bio and no inline citations gets filtered out of citation candidate pools.

Structured headings and factual density. A clear H1 → H2 → H3 hierarchy lets retrieval pipelines chunk your page accurately. Pages with one giant H1 and walls of text without subheadings get chunked poorly and cited rarely. Inside each chunk, factual density matters — 4–6 named entities (people, dates, products, numbers, places) per 100 words score higher than vague prose. LLMs use named-entity counts as a quick proxy for "this passage is informative."

For the complete GEO playbook with all 14 tactics — robots.txt allowlists, llms.txt, schema, page speed, citable passages, brand authority — see our GEO guide. For the 18 ranking factors AI search engines weight when assembling answers, see AI Search Engine Optimization. For the older ground-floor framing — what counts as an AI SEO audit and how it differs from classical audits — see What Is an AI SEO Audit. llms.txt is the gateway file; those guides cover the rest of the surface.

Frequently Asked Questions

What is llms.txt?

llms.txt is a plain-text Markdown file at the root of a website (e.g. example.com/llms.txt) that lists the URLs and content sections most useful for AI systems like ChatGPT, Claude, and Perplexity. It uses a simple Markdown structure — H1 site name, blockquote summary, H2 sections, and bullet links with one-sentence descriptions — so LLMs can quickly understand the shape of a site without parsing JavaScript-heavy pages. It was proposed by Jeremy Howard of Answer.AI in September 2024 and is now used by Anthropic, Cloudflare, Vercel, Astro, and a growing list of dev-tooling companies.

Where do I put llms.txt on my website?

It must live at the root of your domain — exactly /llms.txt, not /docs/llms.txt or /llms.html. The full URL should be https://yourdomain.com/llms.txt and return a 200 status with content-type text/plain. Any other path or content type causes AI systems and validators to skip the file. If you also publish llms-full.txt, it lives at /llms-full.txt at the same level. Verify with curl -I https://yourdomain.com/llms.txt — confirm both the 200 status and the text/plain content-type header.

Does Google use llms.txt?

Not as of May 2026. Google has not announced support for llms.txt in Search, Gemini, or AI Overviews. Google's AI surfaces still rely on Googlebot, Google-Extended, and the standard web crawl plus the structured data already embedded on the page (JSON-LD, microdata). Publishing llms.txt does not hurt Google rankings, but it does not help them directly either. For Google AI Overview citations, focus on schema markup, page speed, and EEAT signals — see our GEO guide at /blog/generative-engine-optimization-guide for the full playbook.

Does ChatGPT respect llms.txt?

Inconsistently. OpenAI has not committed to supporting llms.txt as a formal signal — GPTBot and OAI-SearchBot still primarily rely on robots.txt, sitemap.xml, and the page itself. That said, in our crawl-log analysis on sitetest.ai we see ChatGPT-User fetching /llms.txt on a small fraction of sessions, suggesting the file is at least being probed. Treat it as a free signal that costs five minutes to ship — not as a hard ranking factor. The platforms that explicitly use it today are Perplexity (in browse mode) and Anthropic's tooling.

Is llms.txt the same as robots.txt?

No. robots.txt tells crawlers which paths they may access — it controls behavior. llms.txt tells AI systems which content is most useful for them — it curates priority. robots.txt is a 30-year-old IETF-adjacent standard supported by virtually every crawler. llms.txt is a 2024 proposal supported by a small but growing list of AI tooling companies. They serve different purposes and should both exist on most sites: robots.txt for access rules, llms.txt for content curation.

How do I create llms.txt?

Eight steps. (1) Create a plain-text file at your site root. (2) Start with H1 and your site name. (3) Add a one-line blockquote summary. (4) Group key URLs under H2 sections like Docs, Examples, API. (5) Each link should be Markdown link plus a colon plus a one-sentence description. (6) Keep total length under 50 KB. (7) Serve as text/plain with HTTP 200. (8) Validate with curl plus a manual read. We walk through each step with examples in section 5 of this guide.

What is llms-full.txt?

llms-full.txt is a sibling file at /llms-full.txt that contains the full text content of your most important pages concatenated into a single document — not just links. The idea is to give offline LLMs and ingestion pipelines a complete corpus they can train or fine-tune on without crawling the live site. It's larger (often 200 KB to several MB), more expensive to maintain, and only useful for sites with stable canonical content like documentation, public APIs, or technical specs. Most marketing sites should publish llms.txt only — llms-full.txt is overkill.

Should small sites have llms.txt?

Yes. The file is a 5-minute investment that costs nothing in performance, hosting, or design. Even a 10-page site benefits from a curated llms.txt because it tells AI engines which 5–7 URLs are most representative — your homepage, pricing page, top blog posts, and contact page. If you have content worth being cited at all, you have content worth listing in llms.txt. The exception is purely transactional sites (pure ecommerce checkout flows) where there's no informational content to cite.

Can I block AI crawlers with llms.txt?

No — that's what robots.txt is for. llms.txt is purely additive: it lists what you want AI systems to read. To block AI crawlers, use robots.txt with explicit Disallow rules for GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot, and Google-Extended. Confusing the two is one of the six common mistakes covered in section 7. We recommend almost no one block AI crawlers — see our GEO guide at /blog/generative-engine-optimization-guide for the case against blocking.

Does llms.txt help SEO?

Not for classical Google rankings. llms.txt does not affect blue-link SEO — Google does not parse it as a ranking signal. It can indirectly help AI search visibility on platforms that respect it (Perplexity, Anthropic tools, some open-source LLM projects). The realistic framing: publishing llms.txt is a cheap insurance policy for the next 12–24 months as the standard either gains traction or fades. For ranking gains today, prioritize schema, page speed, and citable passages — covered in our AI Search Engine Optimization guide at /blog/ai-search-engine-optimization.

What's the difference between llms.txt and sitemap.xml?

sitemap.xml lists every URL you want indexed, in machine-readable XML, for classical search crawlers. llms.txt lists only your most useful URLs, in human- and AI-readable Markdown, for LLMs. Sitemap is exhaustive and structural; llms.txt is curated and editorial. Sitemap can have 100,000 URLs; llms.txt should rarely exceed 100. Both files are complementary — keep your sitemap for Google, ship llms.txt for AI engines. We compare both to robots.txt in section 3.

How often should I update llms.txt?

Whenever your site structure or canonical content changes meaningfully. For most marketing sites that's once per quarter, aligned with your content refresh cadence. For documentation sites with frequent releases, update llms.txt on every major version (link the new docs sections, retire deprecated ones). The file should always reflect what's currently most citable on your site — stale llms.txt with broken links signals neglect to AI engines and reduces trust.

Are there any llms.txt validators?

Yes, several. The reference validator at llmstxt.org (proposed by the spec authors) checks syntax, link health, and recommended structure. Sitetest.ai bundles llms.txt validation into its 168-check audit and flags missing files, wrong content-type, broken links, and empty descriptions. Cloudflare's recent AI Audit beta also validates llms.txt presence as part of its bot-management dashboard. For quick manual checks: curl -I https://yourdomain.com/llms.txt and verify 200 plus text/plain.

What's the future of llms.txt?

Two scenarios. The optimistic path: by late 2026 or early 2027, ChatGPT, Gemini, and Bing Copilot add explicit support, and llms.txt becomes a de-facto standard like robots.txt. The pessimistic path: the major AI engines never commit, the file remains useful only on Perplexity and a handful of tooling platforms, and it fades into a developer convention rather than a standard. Either way, the cost of publishing today is so low (5 minutes) that the expected value is positive even if adoption stalls. Early adopters lose nothing and gain optionality.

Conclusion + CTA

llms.txt is the cheapest experiment in AI search visibility you'll run this year. Five minutes of editing, a curated list of 10–30 URLs, a Content-Type: text/plain header, and you're shipped. The downside is zero — the file doesn't hurt SEO, doesn't slow your site, doesn't break anything. The upside is real today on Perplexity and Anthropic platforms, and increasingly likely on ChatGPT and Gemini over the next 12–18 months as adoption pressure builds.

The deeper point: llms.txt is one of three or four AI-search files that didn't exist in 2023 and will be table stakes by 2027. Sites that ship them early — alongside the schema, page-speed, and citable-passage work covered in our GEO guide — compound their AI visibility advantage one quarter at a time. Sites that wait for the standard to formalize will be six to twelve months behind when their competitors are already cited consistently across the major AI engines. Treat llms.txt as a free option on the AI-search future. Buy the option, hold it, and revisit the rest of your AI-visibility stack.

To audit your current llms.txt — or generate one from your site if you don't have it yet — run a free scan on sitetest.ai. The audit checks llms.txt presence, format, link health, and content-type, plus the broader 168 AI citability factors that determine whether the URLs you list will actually get cited. Sixty seconds, no signup, dev-friendly output.

Methodology

This guide draws on the original llms.txt proposal published by Jeremy Howard at Answer.AI in September 2024, the spec maintained at llmstxt.org, public Common Crawl scans of /llms.txt files across the open web, and internal audit data from sitetest.ai across the 168-check suite run on thousands of sites monthly. Adoption estimates are approximate — there's no central registry of llms.txt-publishing sites, so the 1,200+ figure is derived from Common Crawl plus community-maintained lists and should be treated as a directional indicator rather than a precise count. AI engine respect levels (Perplexity yes, Anthropic yes, ChatGPT inconsistent, Google no) reflect public statements and our own crawl-log analysis as of May 2026 and may shift as the standard matures. We refresh this guide quarterly — the next scheduled update is August 2026, and the dateModified reflects the last revision.