Introduction
In June 2026 the same question keeps coming back across engineering timelines and SEO channels: how should you serve your content to AI agents? Plain Markdown, because that is what models seem to like? A separate machine endpoint? A new knowledge format? The discussion is loud, it is opinionated, and most of it is talking past itself.
Here is the practitioner position up front, because we have skin in this game. We already serve clean, semantic, server-rendered HTML plus Schema.org from an Astro front end on Cloudflare. This debate validates that choice. It does not threaten it. Almost every “you must switch to Markdown for agents” argument collapses the moment you separate three layers that are constantly being mashed into one.
This is not sideline commentary. We run on the exact infrastructure that the debate is about, and we already operate the agent layer that the debate keeps pointing at as the real future. So this is a report from inside, not a recap from the cheap seats.
Key takeaways at a glance
- The argument conflates three layers that are not the same problem: Markdown as agent output, Markdown as a way to serve pages, and OKF as a knowledge layer.
- Markdown as agent output is a machine-to-human rendering choice, and one of the people who pushed it hardest just abandoned it for HTML.
- Serving Markdown to bots at the same URL where humans get HTML is, at best, redundant and, at worst, cloaking. Google and Bing both said as much, bluntly.
- OKF is a curated knowledge format for agent pipelines, not a website-serving format. It is a different layer than SEO.
- The only content-serving signal that both Google and Bing document as actually consumed is clean semantic HTML plus Schema.
- Observe, do not rush to implement, the new serving formats. The real forward bet is the agent-action layer, and that is the part we have already built.
Three layers everyone is conflating
Most of the heat in this debate comes from treating three separate questions as one. Pull them apart and the contradictions dissolve.
Layer 1: Markdown as agent output
This is about what a model writes back to a human, not how a website is served. When an agent generates a report, a chat answer, or a document, in what format should it emit?
For a long time the default answer was Markdown. It is clean, it is token-cheap, it renders nicely in a chat bubble. Then Thariq Shihipar, who works on Claude Code at Anthropic, publicly walked it back. After building side-by-side examples comparing HTML and Markdown effectiveness, his conclusion was that HTML wins for agent output, because HTML carries the structure, semantics and interactivity that a richer human-facing surface needs. Markdown flattens too much.
Read that carefully, because it is routinely quoted backwards. The person closest to agent output is moving toward HTML, not away from it. And critically, this layer says nothing about how you should serve your marketing site to a crawler. It is machine-to-human communication. Anyone citing Thariq as a reason to convert your site to Markdown has inverted his own argument.
Layer 2: Markdown-for-Agents page serving
This is the layer that actually touches us, because we run on Cloudflare. Cloudflare’s Markdown-for-Agents converts your HTML to Markdown on the fly when a client sends Accept: text/markdown, advertises a token count through x-markdown-tokens, and reports roughly an 80 percent token reduction versus raw HTML. It is in beta on paid plans, and clients like Claude Code and OpenCode already send the header. It is governed by Content-Signal, which on Cloudflare defaults to opt-in, so this can be on for your domain without a deliberate decision. That default-on detail is the part every Cloudflare customer should actually check.
The token saving is real. The visibility claim is not. There is no documented evidence that serving a Markdown representation changes whether an AI system cites you. And the moment you serve bots a different representation of the same URL than humans receive, you are standing next to the cloaking line.
Google’s John Mueller put it with no diplomacy at all:
“Converting pages to markdown is such a stupid idea. Did you know LLMs can read images? WHY NOT TURN YOUR WHOLE SITE INTO AN IMAGE?”
That is sarcasm with a point inside it. If the model can already read your HTML, a parallel Markdown channel is not new signal, it is a second thing to maintain and keep in sync. Bing’s Fabrice Canel was drier and arguably more damning for anyone hoping to save crawl budget:
“Really want to double crawl load? We’ll crawl anyway to check similarity.”
In other words, the search engine fetches the HTML regardless, to verify your Markdown matches what humans see. You do not reduce load, you add a surface that has to agree with the canonical one or you get flagged. Two of the largest crawl operators on the planet told you, in public, that this does not do what its proponents hope.
Layer 3: OKF as a knowledge layer
On 12 June 2026 Google Cloud published the Open Knowledge Format, OKF, with a public reference repository. It is deliberately humble: Markdown files with YAML frontmatter, one concept per file, only the type field required, producer and consumer kept independent. The framing is “a format, not a platform”, and it owes an obvious debt to Andrej Karpathy’s LLM-wiki gist, the idea of a human-curated knowledge base written for machines.
Here is what matters and what the recaps miss: OKF is not a way to serve your website. It is a way to package curated knowledge so that an agent pipeline can consume it. It lives upstream of retrieval, in the context and grounding layer, not at the URL where a crawler meets your page. As one commenter on the announcement put it, OKF makes sense, but on a different layer than SEO. Conflating “Google released a Markdown knowledge format” with “Google wants you to serve your site as Markdown” is the single most common error in the current cycle, and they are not even close to the same thing.
There is a sane version of Markdown in a publishing stack, and it is worth naming so nobody hears this as anti-Markdown. Markdown in your source, rendered to HTML at build time, is exactly how this article is written. That is the right place for it. Shipping raw Markdown to a browser, or to a crawler that expects HTML, is the part that makes no sense, because the consumer on the other end is built around HTML.
Why this debate validates our stack
Strip the three layers apart and the conclusion is almost boring, which is the point. The thing that already works keeps working.
We serve server-rendered, semantic HTML. Headings are headings, lists are lists, article, nav and time mean what they say, and the structured data is real Schema.org rather than decoration. That is the representation Google indexes, the representation Bing crawls, and the representation an LLM ingests when it fetches the page. It is also, not by accident, the representation that renders fast for humans. There is no fork to keep in sync, no second channel that can drift, no cloaking risk.
Everything the debate is anxious about, we get for free by not chasing it. When Mueller says Markdown conversion is pointless because the model reads your HTML, that is a description of our setup working as intended. When Canel says Bing crawls the HTML anyway, that is fine, because the HTML is the canonical artefact and there is nothing else to reconcile. We did not have to react to either statement. The architecture already answered them.
The one documented signal
If you want a rule that survives the next format announcement, here it is. Clean, server-rendered, semantic HTML with valid Schema.org is the only content-serving approach that both Google and Bing document as something they actually consume. Everything else in this space is either a proposal with no measured consumption, or an optimisation of cost rather than visibility.
Bing, through Copilot, reads structured data. Google reads structured data for its own surfaces. The large language models ingest the rendered HTML. None of the new serving formats, llms.txt, Markdown-for-Agents, ai.txt, has a documented effect on whether you get cited. So the honest engineering posture is: keep the HTML clean, keep the Schema valid, and treat new serving formats as things to observe rather than implement. The same discipline applies to a headless WooCommerce build on Astro: the commerce data is real semantic markup, not a bot-only side channel.
The honest take on llms.txt
We publish /llms.txt and /llms-full.txt, so this is self-criticism, not a cheap shot at someone else. The skeptics have a strong case. Mueller has said the format is essentially ignored, and an independent server-log study found zero AI-crawler requests for llms.txt across hundreds of domains over several months. As a standalone file dropped on a site in the hope that something reads it, it does very little.
Our own AI visibility playbook says exactly that, in writing: no major LLM provider formally commits to reading these files, but they appear in our logs often enough to justify keeping them. We hold both thoughts at once. A generic, orphaned llms.txt is close to dead weight. The same file as one node of an integrated agent-discovery setup, wired to a real action layer, is a different object with a different cost-benefit profile. The mistake is citing the “nobody reads llms.txt” studies as if they settled the question for every implementation. They settled it for the stray-file case.
The real forward bet: the agent-action layer
Here is where we break with the “just serve Markdown” crowd entirely, and where the genuinely interesting future is. The next step is not a better document for an agent to read. It is letting the agent act without reading a document at all.
That is the agent-action layer: WebMCP, Agent2Agent (A2A), and the Model Context Protocol. Instead of scraping a services page and guessing, an agent calls a function, request_quote, browse_services, search_site, and gets a typed answer. WebMCP, a Google and Microsoft collaboration, has been in Chrome developer preview since February 2026, and it points squarely at this model: the page exposes capabilities, the agent invokes them.
We have already built this. Under public/.well-known/ we publish an A2A AgentCard, an MCP server-card following SEP-1649, an ACP descriptor, and markdown content-negotiation through a .md URL suffix and Accept: text/markdown handling in middleware, with the whole thing advertised via Link headers and robots rules. The fetch_markdown skill on our AgentCard points at /llms-full.txt, which is precisely why the llms files are not orphaned here, they are wired into the action layer rather than sitting alone.
Notice the asymmetry. Markdown-for-Agents and content-negotiation, the serving formats, we treat as observe-do-not-implement, present because the infrastructure offers them, not because we have measured a benefit. The action layer we treat as a deliberate forward investment, because that is the direction Thariq’s HTML argument, WebMCP, and the whole agent-tooling wave are all pointing. Reading is the present. Acting is the bet.
What we are actually doing
To make the posture concrete, here is the split.
- Serving: clean semantic HTML plus Schema.org, server-rendered, fast. This is the load-bearing decision and it is not changing.
- Markdown-for-Agents and Content-Signal: present on Cloudflare, left enabled where it is harmless, but checked, because the opt-in default means it can be on without a decision. No visibility claim attached.
- llms.txt and llms-full.txt: published, but as wired nodes of the agent-discovery system, not as a standalone bet, and described honestly in our own playbook.
- OKF: filed under knowledge layer. Relevant if and when we feed curated knowledge into an agent pipeline. Not a website-serving change.
- Agent-action layer, A2A, MCP, WebMCP-adjacent: deliberate investment, already shipped under
/.well-known/, and the part of this whole debate we are most confident about.
Conclusion
The 2026 “HTML vs Markdown for agents” debate looks like a fork in the road. It is not. Once you separate agent output from page serving from the knowledge layer, the three arguments stop contradicting each other and they all point the same way. Serve clean semantic HTML and valid Schema, because it is the one signal both major crawlers document consuming. Observe the new serving formats instead of chasing them, because none has a measured effect on citations. And put your forward energy into the agent-action layer, because that is where reading turns into acting.
We did not arrive here by predicting the debate. We arrived here by building on the boring, documented signal and treating everything newer as something to measure first. That is the whole method. The loud part of the internet is arguing about the format. The quiet, documented answer has not changed.
If you want the broader visibility picture, our AI and LLM visibility playbook collects the rest of the levers in priority order.
Last updated: 15 June 2026.


