What is llms.txt and why it matters for your business

llms.txt helps AI systems understand your website. Learn what it is, how it works, whether it impacts AI visibility, and how to create one for your business.

Blog Collection Athour img
Michal Forystek
Co-founder, Growth & Partnerships
shape

What is llms.txt and why it matters for your business

When someone asks ChatGPT, Claude, or Gemini a question about your industry, does your business appear in the answer? For most companies, the honest answer is: they have no idea. And the reason is that most websites are built for humans and search engines, not for the AI systems that are increasingly shaping how people discover businesses and make decisions.

llms.txt is an emerging convention designed to bridge that gap. This article explains what llms.txt is, how it works, whether it actually impacts AI visibility today, and why it is worth implementing even before the standard is universally adopted.

What is llms.txt?

llms.txt is a plain text file, written in Markdown, placed at the root of a website (e.g., yourdomain.com/llms.txt). Its purpose is to help large language models understand your website — what your business does, what content matters most, and where to find it.

Think of it as a companion to robots.txt, but designed for a different audience. robots.txt tells search engine crawlers what they can and cannot access. llms.txt tells AI systems what your site is about and which pages are most important for understanding your business.

The specification was proposed by Jeremy Howard in September 2024, and the format is deliberately simple. A file following the spec contains a heading with the name of the business, a short summary, and structured sections linking to key pages with brief descriptions of what each page covers.

Why do AI systems need a separate file?

Traditional websites are designed for human readers — navigation menus, sidebars, JavaScript-rendered content, cookie banners, marketing copy. Search engines have learned to parse this complexity over decades of crawling. But AI systems process information differently.

When an LLM retrieves information from a website — whether during training, through a RAG pipeline, or via real-time web search — it needs to convert complex HTML into clean, usable text. This conversion is imprecise. Important content gets mixed with navigation elements, ads, and boilerplate. The AI system cannot easily distinguish between your most authoritative product page and a generic footer link.

llms.txt solves this by providing a clean, structured entry point. Instead of forcing the AI to guess which pages matter, you tell it directly. The file acts as a curated directory — here is what we do, here are the pages that explain it best, and here is a description of each one.

What does an llms.txt file look like?

The format follows a specific Markdown structure:

An H1 heading with the company or site name. A blockquote with a concise summary of the business. Sections with H2 headings organizing content by category. Links to key pages, each with a short description explaining what the page covers.

For example, a fintech company's llms.txt might include sections for its core services, blog content organized by topic, and key information about the company — each with links and one-line descriptions that tell an AI system exactly what it will find at each URL.

The file should be concise and curated. The goal is not to list every page on your site — it is to highlight the pages that best represent your business and contain the most authoritative, up-to-date information.

Does llms.txt actually work?

The evidence is mixed, and honesty about that is more useful than advocacy in either direction.

What the large-scale data says

Research by SE Ranking analyzing 300,000 domains found that about 10% had implemented llms.txt, but found no measurable correlation between having the file and being cited more frequently by LLMs. A separate analysis by ALLMO of over 94,000 cited URLs in AI-generated responses found less than 1% came from sites with llms.txt. At the aggregate level, the signal is not there yet.

What the individual case studies show

The picture looks different at the level of specific implementations. One agency submitted their llms.txt to Google Search Console and documented their file being cited in Google AI Mode answers within 24 hours. Cloudflare server logs from multiple sites show GPTBot, OAI-SearchBot, and ChatGPT-User actively fetching llms.txt files — even without sitemap references or internal links pointing to them. Mintlify reported 436 AI crawler visits to their site after implementing llms.txt, with the majority coming from ChatGPT.

What the major AI companies are doing

No major AI company has officially confirmed using llms.txt as a retrieval or ranking signal. But their actions tell a more interesting story. Google added llms.txt files across their own developer and documentation sites, and included the format in their Agent-to-Agent (A2A) protocol. Anthropic specifically requested llms.txt implementation for their documentation hosted on Mintlify. These are not the actions of companies that consider the format irrelevant.

What this means in practice

The standard is early and evolving. Large-scale correlation is not yet visible, but active crawling and individual citation cases suggest the infrastructure is being built. For businesses thinking about AI visibility, the question is not whether llms.txt delivers guaranteed results today — it is whether you want your AI discoverability infrastructure in place before or after the standard matures.

So why implement it?

The early-mover advantage

AI-driven search is growing rapidly. The businesses that have their AI visibility infrastructure in place before the standard matures will be ahead of those scrambling to catch up. Waiting for universal adoption before acting means competing for attention in a space where others have already established their presence.

The strategic clarity it demands

Creating an effective llms.txt file is not just a technical task. It requires making strategic decisions about which content represents your business most accurately, how to describe your offerings in a way that AI systems can parse and cite correctly, and how to structure the file so it works in concert with your schema markup, robots.txt configuration, and content strategy. Done well, it becomes the anchor of your AI discoverability infrastructure. Done poorly — with generic descriptions, outdated links, or misaligned content — it adds noise rather than clarity.

The signal of AI readiness

For businesses in financial services, technology, and other industries where AI adoption is accelerating, having an llms.txt file signals to potential clients and partners that you understand how AI systems work and that you are proactive about positioning your business in an AI-driven landscape.

How does llms.txt relate to broader AI visibility?

llms.txt is one piece of a larger picture. Whether your business appears in AI-generated answers depends on several factors working together — and getting them right requires understanding how AI systems discover, evaluate, and cite content.

Content structure for AI extraction

AI systems do not read web pages the way humans do. They extract information most reliably from content that follows specific structural patterns — question-based headings, front-loaded definitions, concise citation-ready statements, and clean semantic hierarchy. Content that works well for traditional SEO does not automatically work well for AI citation. The optimization principles overlap, but they are not identical, and the details matter.

Schema markup and structured data

JSON-LD schema markup tells AI systems what a page is about in machine-readable format. But which schema types to implement, how to structure them for maximum AI comprehension, and how to connect them across a multi-page site requires technical knowledge of both schema.org standards and how different AI systems parse structured data.

robots.txt and AI crawler permissions

Your robots.txt file determines whether AI crawlers can access your site at all. GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot (Perplexity), and Google-Extended (Gemini) are the major AI crawlers. Many sites block these by default without realizing it. Configuring access correctly — allowing the right crawlers while maintaining security — is a foundational step that is easy to get wrong.

Content authority and freshness

AI systems prioritize content that is comprehensive, well-sourced, and regularly updated. Building the kind of topical authority that earns AI citations requires a sustained content strategy — not a single page, but a library of interconnected content that demonstrates deep expertise on a subject over time.

Bringing it all together

llms.txt ties these elements together by giving AI systems a single, structured entry point to your site's most important content. But the file itself is only as valuable as the infrastructure behind it — the content quality, the schema markup, the crawler configuration, and the ongoing maintenance that keeps everything accurate and current. Getting this right across all layers is what determines whether your business appears in AI-generated answers or remains invisible.

Key takeaways

llms.txt is an emerging convention that helps AI systems understand your website's structure and most important content. It is not yet an adopted standard, and current research shows no measurable citation impact. But its low implementation cost, organizational value, and positioning benefit make it a practical addition for any business that wants to be prepared for an AI-driven discovery landscape.

The businesses that will benefit most from AI visibility are those that combine llms.txt with structured content, proper schema markup, AI-friendly robots.txt configuration, and a consistent publishing strategy. Each of these layers requires specific expertise — understanding how different AI systems crawl, parse, and cite content is not the same as traditional SEO, and the details of implementation determine whether the effort produces results.

llms.txt is the front door to your AI presence. But what's behind it — the content quality, the technical infrastructure, and the ongoing optimization — is what determines whether AI systems recognize your business as an authoritative source worth citing.

If you want to understand how visible your business is to AI systems and what steps would make the biggest difference, we can help you find out.

Related reading:

Ready to Own Your AI?

Stop renting generic models. Start building specialized AI that runs on your infrastructure, knows your business, and stays under your control.