Measuring AI visibility
EN

Measuring AI visibility

Last verified: June 22, 2026
9min read
Case study
PageSpeed 100/100

#What we measured, and what we found

For a quarter we pointed AI-visibility monitoring at our own site and wrote down what it said, including the parts that were not flattering. The short version: our AI-citation rate is low where it matters most, the method we used moved the number more than the underlying reality did, and the single most useful output of the exercise was learning to distrust a clean-looking figure that came from the wrong instrument. This is the first of a quarterly series, so the numbers below are a baseline, not a victory lap.

Most writing about AI visibility is advice. This is measurement. We are an agency that argues for serving clean server-rendered HTML to AI, so it was only fair to check whether the agency itself gets cited. The answer, across three snapshots in April, May and June 2026, was more interesting than a single number.

#The instrument problem comes first

Before any finding, the caveat that reframes all of them. There are two common ways to measure how often an AI cites you, and they do not agree.

The cheap way is an API proxy. You send your prompts through a model’s API, read the text that comes back, and count brand mentions and links. It is repeatable and almost free, which is why it is everywhere. Its weakness is that the API path is not the product a customer uses. Published comparisons put the source overlap between API responses and the consumer web interface in the low single digits, so a proxy can tell you a model knows your name in the abstract while telling you nothing reliable about what a real user sees.

The honest way is to monitor the real consumer outputs, the same answers a person gets in the ChatGPT or Perplexity interface, including which sources the product actually surfaces. It costs more and is harder to automate. It is also the only number that corresponds to a lost or won visit.

We used both, in that order, and the gap between them is the most reusable lesson in this report.

#April: the cheap proxy baseline

Our first snapshot, on 6 April, was an API-proxy run. The headline figures, across 26 queries against ChatGPT:

MetricResult
Brand-mention rate7.7 percent (2 of 26)
URL-citation rate0 percent
Strongest categoryPlugins, 14.3 percent mention
Transactional category12.5 percent mention
Informational and local0 percent mention

A mention rate under eight percent and a citation rate of zero reads like a disaster. The more useful detail is who got cited instead of us. When the model reached for a source on Polish WordPress work, it named directories and job boards: pracuj.pl appeared three times, alongside clutch.co, olx.pl, home.pl, nazwa.pl and a developer meetup listing. Those are not competitors who out-wrote us. They are aggregators the model trusts as generic answers to a commercial question. That pattern, the assistant defaulting to a directory rather than a specialist, turned out to be the real story, and on-page polish does not fix it.

The report itself carried the warning we want you to carry too: API proxy, directional trends only, roughly four percent source overlap with the web interface. We did not yet appreciate how much that warning mattered.

#May: the instrument breaks honestly

The 11 May snapshot, also a proxy run, returned a result that looked like a bug and was actually the most honest output of the quarter. Across 20 queries, split over ChatGPT, Perplexity, Bing Copilot and Claude, the breakdown was:

EngineQueriesCitedCitation rate
ChatGPT60undetermined
Perplexity60undetermined
Bing Copilot40undetermined
Claude40undetermined

Every single query came back undetermined. Not “not cited”, undetermined. The proxy could not establish whether the answer was grounded in fetched pages at all, so it refused to score. A naive reading turns that into “zero percent citation rate” and a panicked Monday. The correct reading is that the instrument told us it could not see what we were asking it to measure. A measurement that returns undetermined is doing its job. A measurement that quietly returns zero in the same situation is lying to you, and plenty of AI-visibility dashboards do exactly that.

#June: real monitoring, and a useful split

In June we switched to monitoring the real model outputs rather than the API. The picture sharpened immediately, and it was not uniform.

For one narrow query, a Polish studio serving foreign WordPress clients, we ranked first in five of the six models tested. That is a genuine, defensible position, and it matches how we describe ourselves. It also matches reality: it is a specific identity claim with little competition, exactly the kind of query a specialist should own.

For transactional WooCommerce queries, the queries closest to revenue, we were close to invisible. The models answered confidently and did not reach for us. ChatGPT was consistently the weakest channel for the brand, returning the least presence across the set. Perplexity was the strongest, which is unsurprising once you know that Perplexity leans hard on live web search rather than only on training memory. The competitors who did surface were mostly general SEO and SEM agencies marketing themselves as doing “AI SEO”, not WooCommerce specialists. The gap, in other words, is authority and association, not page quality.

#What the three snapshots add up to

Read together, the quarter says three things plainly.

First, the method is part of the number. The April and May proxy runs and the June real-monitoring run were measuring the same site in the same weeks, and they disagreed enough that quoting any one figure without its method would be misleading. If a tool gives you an AI-citation rate without telling you whether it read the API or the product, distrust it.

First-party measurement is the point of this whole exercise, and it is the same discipline we apply to client work: a number you cannot reproduce is not evidence. We learned the same lesson the hard way with a synthetic-brand experiment, where a model confidently described a company that did not exist. Measuring a real brand has the opposite failure mode, confidently reporting zero when the truth is unknown, and both come back to checking the instrument before trusting the reading.

Second, identity queries are winnable and transactional queries are not, at least not on-page. We hold the narrow positioning query because it is specific and lightly contested. We lose the commercial queries because the models lean on directories and broad agencies, and no amount of cleaner HTML changes who a model already associates with “WooCommerce developer”. That is an off-page authority problem.

Third, the channel matters. A brand can be cited in Perplexity and near-absent in ChatGPT in the same week, because the two products ground answers differently. A single blended “AI visibility score” hides exactly the information you need.

#What we changed

We did not rewrite pages in response to a proxy number, because that would be optimising for an instrument rather than a customer. Instead, the measurement changed where we spend effort.

  • We fixed the query set and the cadence: a stable list of identity, informational and transactional queries, recorded monthly, with the engine and date stamped on every figure.
  • We stopped comparing proxy figures with real-monitoring figures, and we now label every number with its method.
  • We moved the transactional-visibility work off-page, because the gap there is association and authority, not on-page content, and that is documented in our off-page authority plan rather than in another rewrite.
  • We kept serving everything in server-rendered HTML, which is the precondition for being citable at all and the subject of our note on why Western assistants read raw HTML.

#How to run this yourself

You do not need our budget to start. The minimum honest setup is a fixed list of ten to twenty queries that real customers would ask, run once a month, with three columns recorded every time: the engine, the date, and whether your brand was named or your URL linked. Add a fourth column for which other domains were cited, because that tells you who you are actually competing against in the answer, which is rarely who you think.

If you use an automated tool, ask it one question before you trust a single chart: are you reading the API or the product? If it cannot answer, treat the output as directional only, the way our April and May runs were. And never let a tool turn an undetermined result into a confident zero.

#The honest takeaway

A quarter of measuring our own AI citations produced one uncomfortable number, our transactional citation rate is low, and one genuinely valuable habit, never quote an AI-visibility figure without the method that produced it. The proxy runs made us look worse than reality in May and the real monitoring showed a defensible identity position the proxy had missed. Both readings were useful precisely because we wrote down how each was taken. This is report one. We will publish the next snapshot at the end of the quarter, on the same query set, so the series can be compared rather than admired. If you want to be cited by AI, start by measuring it honestly, and built that into the workflow we describe for GEO and LLMO.

Next step

Turn the article into an actual implementation

This block strengthens internal linking and gives readers the most relevant next move instead of leaving them at a dead end.

Want this implemented on your site?

If visibility in Google and AI systems matters, I can build the content architecture, FAQ, schema, and internal linking needed for SEO, GEO, and AEO.

Related cluster

Explore other WordPress services and knowledge base

Strengthen your business with professional technical support in key areas of the WordPress ecosystem.

What is an AI-citation rate and why measure it? #
It is how often an AI assistant names your brand, or links your URL, when it answers a relevant question. It matters because AI answers increasingly sit between a searcher and your site. If the assistant cites a directory instead of you, you lose the visit before classic SEO even applies. You cannot improve what you do not measure, so the first step is a baseline.
Why did the API proxy and the real monitoring disagree so much? #
An API proxy sends prompts through a model API and inspects the text. It is cheap and repeatable, but the API path is not the consumer product, and published work suggests only a few percent source overlap with the web interface. Our May snapshot returned undetermined for every query, which is the proxy admitting it could not tell whether grounding occurred. Monitoring that reads the real consumer outputs gives a truer picture, at higher cost. Treat proxy numbers as directional only.
What did the real June monitoring actually show? #
A split. For the narrow identity query about a Polish studio serving foreign WordPress clients we ranked first in five of six models, which validates a real position. For transactional WooCommerce queries, where the money is, we were close to invisible. ChatGPT was the weakest channel for us and Perplexity the strongest, which fits Perplexity leaning on live web search.
How often should I measure AI citations? #
Monthly is enough for a small site, because model behaviour and your own content both move slowly relative to the measurement noise. Pick a fixed query set, record the engine and the date with every figure, and never compare a proxy number against a real-monitoring number. We publish a quarterly snapshot so the series stays honest and comparable.
Is a high AI-citation rate worth chasing for every query? #
No. Identity and informational queries are easier to win and worth holding, but transactional queries are where competitors and directories fight hardest, and where on-page work alone rarely moves the needle. For those, off-page authority matters more. Measure first, then spend where the gap is real, not where the number is easy.

Need an FAQ tailored to your industry and market? We can build one aligned with your business goals.

Let’s discuss

Related Articles

Which Schema.org types matter for AI search engines? Practical guide to AEO and GEO optimization - making your content discoverable by ChatGPT, Perplexity, Google AI Mode, and answer engines.
wordpress

Schema.org for AI Search: How to Appear in ChatGPT, Perplexity, and Google AI Mode

Which Schema.org types matter for AI search engines? Practical guide to AEO and GEO optimization - making your content discoverable by ChatGPT, Perplexity, Google AI Mode, and answer engines.

A June 2026 live test showed six of seven leading Western AI assistants read only raw HTML, not JavaScript-rendered content. What that means if your facts load client-side, and why our stack already serves everything server-side.
technology

Does AI render JavaScript

A June 2026 live test showed six of seven leading Western AI assistants read only raw HTML, not JavaScript-rendered content. What that means if your facts load client-side, and why our stack already serves everything server-side.

The 2026 debate over how to serve content to AI agents conflates three different layers. A practitioner view from a stack that already serves clean semantic HTML and Schema on Cloudflare.
technology

Serving content to AI agents: HTML vs Markdown, and where OKF actually fits

The 2026 debate over how to serve content to AI agents conflates three different layers. A practitioner view from a stack that already serves clean semantic HTML and Schema on Cloudflare.