HTML vs Markdown for AI agents, and where Google OKF fits (2026) | WPPoland

Mariusz Szatkowski

EN

Serving content to AI agents: HTML vs Markdown, and where OKF actually fits

Last verified: July 1, 2026

12 min read

Opinion

PageSpeed 100/100

Key Facts : Serving content to AI agents (HTML vs Markdown vs OKF)

1In June 2026 the practitioner debate over serving content to AI agents conflates three distinct layers, Markdown as agent output, Markdown-for-Agents page serving, and Open Knowledge Format as a knowledge layer.
2Anthropic engineer Thariq Shihipar publicly abandoned Markdown in favour of HTML for agent output, because HTML carries richer structure for human-facing rendering.
3Google's John Mueller called converting pages to Markdown for crawlers "a stupid idea" and Bing's Fabrice Canel said Bing will crawl the HTML anyway to check similarity.
4Cloudflare's Markdown-for-Agents converts HTML to Markdown on the fly via the Accept text/markdown header, reporting roughly 80 percent token reduction, defaulting to opt-in through Content-Signal.
5Google Cloud published the Open Knowledge Format on 12 June 2026, Markdown files with YAML frontmatter, one concept per file, only the type field required.
6Clean, server-rendered, semantic HTML plus Schema.org is the only content-serving signal both Google and Bing document as consumed.
7The forward-looking bet is the agent-action layer, WebMCP, A2A and MCP, where an agent calls a function instead of scraping a page.

Last updated: 2026-06-21

#Introduction

In June 2026 the same question keeps coming back across engineering timelines and SEO channels: how should you serve your content to AI agents? Plain Markdown, because that is what models seem to like? A separate machine endpoint? A new knowledge format? The discussion is loud, it is opinionated, and most of it is talking past itself.

Here is the practitioner position up front, because we have skin in this game. We already serve clean, semantic, server-rendered HTML plus Schema.org from an Astro front end on Cloudflare. This debate validates that choice. It does not threaten it. Almost every “you must switch to Markdown for agents” argument collapses the moment you separate three layers that are constantly being mashed into one.

This is not sideline commentary. We run on the exact infrastructure that the debate is about, and we already operate the agent layer that the debate keeps pointing at as the real future. So this is a report from inside, not a recap from the cheap seats.

#Key takeaways at a glance

The argument conflates three layers that are not the same problem: Markdown as agent output, Markdown as a way to serve pages, and OKF as a knowledge layer.
Markdown as agent output is a machine-to-human rendering choice, and one of the people who pushed it hardest just abandoned it for HTML.
Serving Markdown to bots at the same URL where humans get HTML is, at best, redundant and, at worst, cloaking. Google and Bing both said as much, bluntly.
OKF is a curated knowledge format for agent pipelines, not a website-serving format. It is a different layer than SEO.
The only content-serving signal that both Google and Bing document as actually consumed is clean semantic HTML plus Schema.
Observe, do not rush to implement, the new serving formats. The real forward bet is the agent-action layer, and that is the part we have already built.

#Three layers everyone is conflating

Most of the heat in this debate comes from treating three separate questions as one. Pull them apart and the contradictions dissolve.

#Layer 1: Markdown as agent output

This is about what a model writes back to a human, not how a website is served. When an agent generates a report, a chat answer, or a document, in what format should it emit?

For a long time the default answer was Markdown. It is clean, it is token-cheap, it renders nicely in a chat bubble. Then Thariq Shihipar, who works on Claude Code at Anthropic, publicly walked it back. After building side-by-side examples comparing HTML and Markdown effectiveness, his conclusion was that HTML wins for agent output, because HTML carries the structure, semantics and interactivity that a richer human-facing surface needs. Markdown flattens too much.

Read that carefully, because it is routinely quoted backwards. The person closest to agent output is moving toward HTML, not away from it. And critically, this layer says nothing about how you should serve your marketing site to a crawler. It is machine-to-human communication. Anyone citing Thariq as a reason to convert your site to Markdown has inverted his own argument.

#Layer 2: Markdown-for-Agents page serving

This is the layer that actually touches us, because we run on Cloudflare. Cloudflare’s Markdown-for-Agents converts your HTML to Markdown on the fly when a client sends Accept: text/markdown, advertises a token count through x-markdown-tokens, and reports roughly an 80 percent token reduction versus raw HTML. It is in beta on paid plans, and clients like Claude Code and OpenCode already send the header. It is governed by Content-Signal, which on Cloudflare defaults to opt-in, so this can be on for your domain without a deliberate decision. That default-on detail is the part every Cloudflare customer should actually check.

The token saving is real. The visibility claim is not. There is no documented evidence that serving a Markdown representation changes whether an AI system cites you. And the moment you serve bots a different representation of the same URL than humans receive, you are standing next to the cloaking line.

Google’s John Mueller put it with no diplomacy at all:

“Converting pages to markdown is such a stupid idea. Did you know LLMs can read images? WHY NOT TURN YOUR WHOLE SITE INTO AN IMAGE?”

That is sarcasm with a point inside it. If the model can already read your HTML, a parallel Markdown channel is not new signal, it is a second thing to maintain and keep in sync. Bing’s Fabrice Canel was drier and arguably more damning for anyone hoping to save crawl budget:

“Really want to double crawl load? We’ll crawl anyway to check similarity.”

In other words, the search engine fetches the HTML regardless, to verify your Markdown matches what humans see. You do not reduce load, you add a surface that has to agree with the canonical one or you get flagged. Two of the largest crawl operators on the planet told you, in public, that this does not do what its proponents hope.

#Layer 3: OKF as a knowledge layer

On 12 June 2026 Google Cloud published the Open Knowledge Format, OKF, with a public reference repository. It is deliberately humble: Markdown files with YAML frontmatter, one concept per file, only the type field required, producer and consumer kept independent. The framing is “a format, not a platform”, and it owes an obvious debt to Andrej Karpathy’s LLM-wiki gist, the idea of a human-curated knowledge base written for machines.

Here is what matters and what the recaps miss: OKF is not a way to serve your website. It is a way to package curated knowledge so that an agent pipeline can consume it. It lives upstream of retrieval, in the context and grounding layer, not at the URL where a crawler meets your page. As one commenter on the announcement put it, OKF makes sense, but on a different layer than SEO. Conflating “Google released a Markdown knowledge format” with “Google wants you to serve your site as Markdown” is the single most common error in the current cycle, and they are not even close to the same thing.

There is a sane version of Markdown in a publishing stack, and it is worth naming so nobody hears this as anti-Markdown. Markdown in your source, rendered to HTML at build time, is exactly how this article is written. That is the right place for it. Shipping raw Markdown to a browser, or to a crawler that expects HTML, is the part that makes no sense, because the consumer on the other end is built around HTML.

#Why this debate validates our stack

Strip the three layers apart and the conclusion is almost boring, which is the point. The thing that already works keeps working.

We serve server-rendered, semantic HTML. Headings are headings, lists are lists, article, nav and time mean what they say, and the structured data is real Schema.org rather than decoration. That is the representation Google indexes, the representation Bing crawls, and the representation an LLM ingests when it fetches the page. It is also, not by accident, the representation that renders fast for humans. There is no fork to keep in sync, no second channel that can drift, no cloaking risk.

Everything the debate is anxious about, we get for free by not chasing it. When Mueller says Markdown conversion is pointless because the model reads your HTML, that is a description of our setup working as intended. When Canel says Bing crawls the HTML anyway, that is fine, because the HTML is the canonical artefact and there is nothing else to reconcile. We did not have to react to either statement. The architecture already answered them.

#The one documented signal

If you want a rule that survives the next format announcement, here it is. Clean, server-rendered, semantic HTML with valid Schema.org is the only content-serving approach that both Google and Bing document as something they actually consume. Everything else in this space is either a proposal with no measured consumption, or an optimisation of cost rather than visibility.

Bing, through Copilot, reads structured data. Google reads structured data for its own surfaces. The large language models ingest the rendered HTML. None of the new serving formats, llms.txt, Markdown-for-Agents, ai.txt, has a documented effect on whether you get cited. So the honest engineering posture is: keep the HTML clean, keep the Schema valid, and treat new serving formats as things to observe rather than implement. The same discipline applies to a headless WooCommerce build on Astro: the commerce data is real semantic markup, not a bot-only side channel.

#The honest take on llms.txt

We publish /llms.txt and /llms-full.txt, so this is self-criticism, not a cheap shot at someone else. The skeptics have a strong case. Mueller has said the format is essentially ignored, and an independent server-log study found zero AI-crawler requests for llms.txt across hundreds of domains over several months. As a standalone file dropped on a site in the hope that something reads it, it does very little.

Our own AI visibility playbook says exactly that, in writing: no major LLM provider formally commits to reading these files, but they appear in our logs often enough to justify keeping them. We hold both thoughts at once. A generic, orphaned llms.txt is close to dead weight. The same file as one node of an integrated agent-discovery setup, wired to a real action layer, is a different object with a different cost-benefit profile. The mistake is citing the “nobody reads llms.txt” studies as if they settled the question for every implementation. They settled it for the stray-file case.

#The real forward bet: the agent-action layer

Here is where we break with the “just serve Markdown” crowd entirely, and where the genuinely interesting future is. The next step is not a better document for an agent to read. It is letting the agent act without reading a document at all.

That is the agent-action layer: WebMCP, Agent2Agent (A2A), and the Model Context Protocol. Instead of scraping a services page and guessing, an agent calls a function, request_quote, browse_services, search_site, and gets a typed answer. WebMCP, a Google and Microsoft collaboration, has been in Chrome developer preview since February 2026, and it points squarely at this model: the page exposes capabilities, the agent invokes them.

We have already built this. Under public/.well-known/ we publish an A2A AgentCard, an MCP server-card following SEP-1649, an ACP descriptor, and markdown content-negotiation through a .md URL suffix and Accept: text/markdown handling in middleware, with the whole thing advertised via Link headers and robots rules. The fetch_markdown skill on our AgentCard points at /llms-full.txt, which is precisely why the llms files are not orphaned here, they are wired into the action layer rather than sitting alone.

A discovery gap sits on top of all this. OKF (Open Knowledge Format) packages a knowledge base but deliberately does not help anyone find it; Joost de Valk pairs it with ARD, Agentic Resource Discovery, a /.well-known/ai-catalog.json that lists what a domain offers and can point straight at an OKF bundle. We now publish one. Our ai-catalog.json indexes the llms-full corpus, the services JSON, and the A2A, MCP and UCP descriptors, each with both type and mediaType for cross-spec compatibility, plus representative queries. We treat it the way we treat the other serving formats, observe and dogfood, with no visibility claim attached, and explicitly draft-stage: ARD and OKF are both v0.9 and the fields can still move. It ships anyway because it costs one static file and puts the resources we already maintain behind a single index an agent can read first.

Notice the asymmetry. Markdown-for-Agents and content-negotiation, the serving formats, we treat as observe-do-not-implement, present because the infrastructure offers them, not because we have measured a benefit. The action layer we treat as a deliberate forward investment, because that is the direction Thariq’s HTML argument, WebMCP, and the whole agent-tooling wave are all pointing. Reading is the present. Acting is the bet.

#What we are actually doing

To make the posture concrete, here is the split.

Serving: clean semantic HTML plus Schema.org, server-rendered, fast. This is the load-bearing decision and it is not changing.
Markdown-for-Agents and Content-Signal: present on Cloudflare, left enabled where it is harmless, but checked, because the opt-in default means it can be on without a decision. No visibility claim attached.
llms.txt and llms-full.txt: published, but as wired nodes of the agent-discovery system, not as a standalone bet, and described honestly in our own playbook.
OKF: filed under knowledge layer. Relevant if and when we feed curated knowledge into an agent pipeline. Not a website-serving change.
Agent-action layer, A2A, MCP, WebMCP-adjacent: deliberate investment, already shipped under /.well-known/, and the part of this whole debate we are most confident about.

#Conclusion

The 2026 “HTML vs Markdown for agents” debate looks like a fork in the road. It is not. Once you separate agent output from page serving from the knowledge layer, the three arguments stop contradicting each other and they all point the same way. Serve clean semantic HTML and valid Schema, because it is the one signal both major crawlers document consuming. Observe the new serving formats instead of chasing them, because none has a measured effect on citations. And put your forward energy into the agent-action layer, because that is where reading turns into acting.

We did not arrive here by predicting the debate. We arrived here by building on the boring, documented signal and treating everything newer as something to measure first. That is the whole method. The loud part of the internet is arguing about the format. The quiet, documented answer has not changed.

If you want the broader visibility picture, our AI and LLM visibility playbook collects the rest of the levers in priority order.

Last updated: 15 June 2026.

Serving content to AI agents: HTML vs Markdown, and where OKF actually fits

#Introduction

#Key takeaways at a glance

#Three layers everyone is conflating

#Layer 1: Markdown as agent output

#Layer 2: Markdown-for-Agents page serving

#Layer 3: OKF as a knowledge layer

#Why this debate validates our stack

#The one documented signal

#The honest take on llms.txt

#The real forward bet: the agent-action layer

#What we are actually doing

#Conclusion

Turn the article into an actual implementation

Most relevant next steps

Want this implemented on your site?

Explore other WordPress services and knowledge base

Related categories

Supporting articles

Frequently Asked Questions

Related Articles

Does AI render JavaScript

Schema.org for AI Search: How to Appear in ChatGPT, Perplexity, and Google AI Mode

Why Perplexity cites your brand and ChatGPT does not

Mariusz Szatkowski