Serving content to AI agents: HTML vs Markdown, and where OKF actually fits
EN

Serving content to AI agents: HTML vs Markdown, and where OKF actually fits

Last verified: June 15, 2026
11min read
Opinion
PageSpeed 100/100

#Introduction

In June 2026 the same question keeps coming back across engineering timelines and SEO channels: how should you serve your content to AI agents? Plain Markdown, because that is what models seem to like? A separate machine endpoint? A new knowledge format? The discussion is loud, it is opinionated, and most of it is talking past itself.

Here is the practitioner position up front, because we have skin in this game. We already serve clean, semantic, server-rendered HTML plus Schema.org from an Astro front end on Cloudflare. This debate validates that choice. It does not threaten it. Almost every “you must switch to Markdown for agents” argument collapses the moment you separate three layers that are constantly being mashed into one.

This is not sideline commentary. We run on the exact infrastructure that the debate is about, and we already operate the agent layer that the debate keeps pointing at as the real future. So this is a report from inside, not a recap from the cheap seats.

#Key takeaways at a glance

  • The argument conflates three layers that are not the same problem: Markdown as agent output, Markdown as a way to serve pages, and OKF as a knowledge layer.
  • Markdown as agent output is a machine-to-human rendering choice, and one of the people who pushed it hardest just abandoned it for HTML.
  • Serving Markdown to bots at the same URL where humans get HTML is, at best, redundant and, at worst, cloaking. Google and Bing both said as much, bluntly.
  • OKF is a curated knowledge format for agent pipelines, not a website-serving format. It is a different layer than SEO.
  • The only content-serving signal that both Google and Bing document as actually consumed is clean semantic HTML plus Schema.
  • Observe, do not rush to implement, the new serving formats. The real forward bet is the agent-action layer, and that is the part we have already built.

#Three layers everyone is conflating

Most of the heat in this debate comes from treating three separate questions as one. Pull them apart and the contradictions dissolve.

#Layer 1: Markdown as agent output

This is about what a model writes back to a human, not how a website is served. When an agent generates a report, a chat answer, or a document, in what format should it emit?

For a long time the default answer was Markdown. It is clean, it is token-cheap, it renders nicely in a chat bubble. Then Thariq Shihipar, who works on Claude Code at Anthropic, publicly walked it back. After building side-by-side examples comparing HTML and Markdown effectiveness, his conclusion was that HTML wins for agent output, because HTML carries the structure, semantics and interactivity that a richer human-facing surface needs. Markdown flattens too much.

Read that carefully, because it is routinely quoted backwards. The person closest to agent output is moving toward HTML, not away from it. And critically, this layer says nothing about how you should serve your marketing site to a crawler. It is machine-to-human communication. Anyone citing Thariq as a reason to convert your site to Markdown has inverted his own argument.

#Layer 2: Markdown-for-Agents page serving

This is the layer that actually touches us, because we run on Cloudflare. Cloudflare’s Markdown-for-Agents converts your HTML to Markdown on the fly when a client sends Accept: text/markdown, advertises a token count through x-markdown-tokens, and reports roughly an 80 percent token reduction versus raw HTML. It is in beta on paid plans, and clients like Claude Code and OpenCode already send the header. It is governed by Content-Signal, which on Cloudflare defaults to opt-in, so this can be on for your domain without a deliberate decision. That default-on detail is the part every Cloudflare customer should actually check.

The token saving is real. The visibility claim is not. There is no documented evidence that serving a Markdown representation changes whether an AI system cites you. And the moment you serve bots a different representation of the same URL than humans receive, you are standing next to the cloaking line.

Google’s John Mueller put it with no diplomacy at all:

“Converting pages to markdown is such a stupid idea. Did you know LLMs can read images? WHY NOT TURN YOUR WHOLE SITE INTO AN IMAGE?”

That is sarcasm with a point inside it. If the model can already read your HTML, a parallel Markdown channel is not new signal, it is a second thing to maintain and keep in sync. Bing’s Fabrice Canel was drier and arguably more damning for anyone hoping to save crawl budget:

“Really want to double crawl load? We’ll crawl anyway to check similarity.”

In other words, the search engine fetches the HTML regardless, to verify your Markdown matches what humans see. You do not reduce load, you add a surface that has to agree with the canonical one or you get flagged. Two of the largest crawl operators on the planet told you, in public, that this does not do what its proponents hope.

#Layer 3: OKF as a knowledge layer

On 12 June 2026 Google Cloud published the Open Knowledge Format, OKF, with a public reference repository. It is deliberately humble: Markdown files with YAML frontmatter, one concept per file, only the type field required, producer and consumer kept independent. The framing is “a format, not a platform”, and it owes an obvious debt to Andrej Karpathy’s LLM-wiki gist, the idea of a human-curated knowledge base written for machines.

Here is what matters and what the recaps miss: OKF is not a way to serve your website. It is a way to package curated knowledge so that an agent pipeline can consume it. It lives upstream of retrieval, in the context and grounding layer, not at the URL where a crawler meets your page. As one commenter on the announcement put it, OKF makes sense, but on a different layer than SEO. Conflating “Google released a Markdown knowledge format” with “Google wants you to serve your site as Markdown” is the single most common error in the current cycle, and they are not even close to the same thing.

There is a sane version of Markdown in a publishing stack, and it is worth naming so nobody hears this as anti-Markdown. Markdown in your source, rendered to HTML at build time, is exactly how this article is written. That is the right place for it. Shipping raw Markdown to a browser, or to a crawler that expects HTML, is the part that makes no sense, because the consumer on the other end is built around HTML.

#Why this debate validates our stack

Strip the three layers apart and the conclusion is almost boring, which is the point. The thing that already works keeps working.

We serve server-rendered, semantic HTML. Headings are headings, lists are lists, article, nav and time mean what they say, and the structured data is real Schema.org rather than decoration. That is the representation Google indexes, the representation Bing crawls, and the representation an LLM ingests when it fetches the page. It is also, not by accident, the representation that renders fast for humans. There is no fork to keep in sync, no second channel that can drift, no cloaking risk.

Everything the debate is anxious about, we get for free by not chasing it. When Mueller says Markdown conversion is pointless because the model reads your HTML, that is a description of our setup working as intended. When Canel says Bing crawls the HTML anyway, that is fine, because the HTML is the canonical artefact and there is nothing else to reconcile. We did not have to react to either statement. The architecture already answered them.

#The one documented signal

If you want a rule that survives the next format announcement, here it is. Clean, server-rendered, semantic HTML with valid Schema.org is the only content-serving approach that both Google and Bing document as something they actually consume. Everything else in this space is either a proposal with no measured consumption, or an optimisation of cost rather than visibility.

Bing, through Copilot, reads structured data. Google reads structured data for its own surfaces. The large language models ingest the rendered HTML. None of the new serving formats, llms.txt, Markdown-for-Agents, ai.txt, has a documented effect on whether you get cited. So the honest engineering posture is: keep the HTML clean, keep the Schema valid, and treat new serving formats as things to observe rather than implement. The same discipline applies to a headless WooCommerce build on Astro: the commerce data is real semantic markup, not a bot-only side channel.

#The honest take on llms.txt

We publish /llms.txt and /llms-full.txt, so this is self-criticism, not a cheap shot at someone else. The skeptics have a strong case. Mueller has said the format is essentially ignored, and an independent server-log study found zero AI-crawler requests for llms.txt across hundreds of domains over several months. As a standalone file dropped on a site in the hope that something reads it, it does very little.

Our own AI visibility playbook says exactly that, in writing: no major LLM provider formally commits to reading these files, but they appear in our logs often enough to justify keeping them. We hold both thoughts at once. A generic, orphaned llms.txt is close to dead weight. The same file as one node of an integrated agent-discovery setup, wired to a real action layer, is a different object with a different cost-benefit profile. The mistake is citing the “nobody reads llms.txt” studies as if they settled the question for every implementation. They settled it for the stray-file case.

#The real forward bet: the agent-action layer

Here is where we break with the “just serve Markdown” crowd entirely, and where the genuinely interesting future is. The next step is not a better document for an agent to read. It is letting the agent act without reading a document at all.

That is the agent-action layer: WebMCP, Agent2Agent (A2A), and the Model Context Protocol. Instead of scraping a services page and guessing, an agent calls a function, request_quote, browse_services, search_site, and gets a typed answer. WebMCP, a Google and Microsoft collaboration, has been in Chrome developer preview since February 2026, and it points squarely at this model: the page exposes capabilities, the agent invokes them.

We have already built this. Under public/.well-known/ we publish an A2A AgentCard, an MCP server-card following SEP-1649, an ACP descriptor, and markdown content-negotiation through a .md URL suffix and Accept: text/markdown handling in middleware, with the whole thing advertised via Link headers and robots rules. The fetch_markdown skill on our AgentCard points at /llms-full.txt, which is precisely why the llms files are not orphaned here, they are wired into the action layer rather than sitting alone.

Notice the asymmetry. Markdown-for-Agents and content-negotiation, the serving formats, we treat as observe-do-not-implement, present because the infrastructure offers them, not because we have measured a benefit. The action layer we treat as a deliberate forward investment, because that is the direction Thariq’s HTML argument, WebMCP, and the whole agent-tooling wave are all pointing. Reading is the present. Acting is the bet.

#What we are actually doing

To make the posture concrete, here is the split.

  • Serving: clean semantic HTML plus Schema.org, server-rendered, fast. This is the load-bearing decision and it is not changing.
  • Markdown-for-Agents and Content-Signal: present on Cloudflare, left enabled where it is harmless, but checked, because the opt-in default means it can be on without a decision. No visibility claim attached.
  • llms.txt and llms-full.txt: published, but as wired nodes of the agent-discovery system, not as a standalone bet, and described honestly in our own playbook.
  • OKF: filed under knowledge layer. Relevant if and when we feed curated knowledge into an agent pipeline. Not a website-serving change.
  • Agent-action layer, A2A, MCP, WebMCP-adjacent: deliberate investment, already shipped under /.well-known/, and the part of this whole debate we are most confident about.

#Conclusion

The 2026 “HTML vs Markdown for agents” debate looks like a fork in the road. It is not. Once you separate agent output from page serving from the knowledge layer, the three arguments stop contradicting each other and they all point the same way. Serve clean semantic HTML and valid Schema, because it is the one signal both major crawlers document consuming. Observe the new serving formats instead of chasing them, because none has a measured effect on citations. And put your forward energy into the agent-action layer, because that is where reading turns into acting.

We did not arrive here by predicting the debate. We arrived here by building on the boring, documented signal and treating everything newer as something to measure first. That is the whole method. The loud part of the internet is arguing about the format. The quiet, documented answer has not changed.

If you want the broader visibility picture, our AI and LLM visibility playbook collects the rest of the levers in priority order.

Last updated: 15 June 2026.

Next step

Turn the article into an actual implementation

This block strengthens internal linking and gives readers the most relevant next move instead of leaving them at a dead end.

Want this implemented on your site?

If visibility in Google and AI systems matters, I can build the content architecture, FAQ, schema, and internal linking needed for SEO, GEO, and AEO.

Related cluster

Explore other WordPress services and knowledge base

Strengthen your business with professional technical support in key areas of the WordPress ecosystem.

Should I serve Markdown to AI crawlers instead of HTML? #
There is no documented citation or ranking benefit to serving Markdown instead of HTML. Google and Bing both say they read the HTML, and Bing says it will crawl the page anyway to check that the Markdown matches. Serving bots a different representation than humans at the same URL also edges toward cloaking. Keep serving clean semantic HTML.
What is the Open Knowledge Format (OKF)? #
OKF is a Google Cloud specification published on 12 June 2026 for sharing curated knowledge with agents. It uses Markdown files with YAML frontmatter, one concept per file, and only the type field is required. It is a knowledge and context layer for agent pipelines, not a way to serve your website pages, so it does not replace HTML.
Does Markdown-for-Agents on Cloudflare help my AI visibility? #
It reduces token cost when a bot requests the Markdown representation, around 80 percent in Cloudflare's figures, but there is no evidence it changes whether or how often an AI system cites you. Treat it as an observe-do-not-implement feature, and check the Content-Signal default because it is opt-in by default on Cloudflare.
Is llms.txt worth publishing? #
No major LLM provider formally commits to reading llms.txt, and an independent server-log study found zero AI-crawler requests for it across hundreds of domains over months. A stray llms.txt does little on its own. It is more defensible as one node of an integrated agent-discovery setup than as a standalone file.
What actually improves how AI systems read my site? #
Clean, server-rendered, semantic HTML with valid Schema.org markup, fast responses, and an explicit robots policy for AI crawlers. Beyond serving, the genuine forward bet is the agent-action layer, exposing functions through MCP, A2A or WebMCP so an agent can act instead of scraping.

Need an FAQ tailored to your industry and market? We can build one aligned with your business goals.

Let’s discuss

Related Articles

Which Schema.org types matter for AI search engines? Practical guide to AEO and GEO optimization - making your content discoverable by ChatGPT, Perplexity, Google AI Mode, and answer engines.
wordpress

Schema.org for AI Search: How to Appear in ChatGPT, Perplexity, and Google AI Mode

Which Schema.org types matter for AI search engines? Practical guide to AEO and GEO optimization - making your content discoverable by ChatGPT, Perplexity, Google AI Mode, and answer engines.

By 2026, AI search has replaced the blue link. Discover 7 data-backed truths about zero-click search, GEO optimization, and how to become the answer engines cite.
technology

The end of the click: 7 surprising truths about how AI is rewriting the web

By 2026, AI search has replaced the blue link. Discover 7 data-backed truths about zero-click search, GEO optimization, and how to become the answer engines cite.

A practitioner walkthrough to ship a WordPress site that ranks in 2026. Technical SEO, Core Web Vitals, schema, AEO, GEO, hreflang and the sequence that gets it right the first time.
wordpress

How to create an SEO-optimized WordPress site in 2026

A practitioner walkthrough to ship a WordPress site that ranks in 2026. Technical SEO, Core Web Vitals, schema, AEO, GEO, hreflang and the sequence that gets it right the first time.