Xarumei AI experiment: how LLMs hallucinate brands | WPPoland

Mariusz Szatkowski

EN

AI experiment: The fake brand "xarumei" and llm hallucinations

Last verified: July 21, 2026

4 min read

Case study

AI integration

Marketing strategist

An Ahrefs marketing researcher created a completely fictional luxury paperweight company called Xarumei, built its website in an hour using AI, and systematically tested eight major AI tools. Over two months, he flooded the web with three deliberately contradictory false narratives, then asked 56 carefully crafted questions designed to reveal how AI models distinguish truth from fiction.

The results show measurable gaps in how AI handles brand information, with direct consequences for online reputation management (ORM).

#The experiment

The experiment took place in two phases. Initially, the researcher tested basic AI behavior by asking questions about a brand that shouldn’t exist - questions involving false celebrity endorsements, defective products, and Black Friday sales that never happened.

GPT-4 and GPT-5 performed best, correctly answering 53-54 out of 56 questions and stating “this does not exist” where appropriate. Perplexity failed about 40% of the questions, often confusing Xarumei with Xiaomi smartphones. Claude refused to hallucinate entirely but also never used the website content. Gemini and Google’s AI Overview often refused to treat Xarumei as real because they couldn’t find it in search results.

In the clearest failure case, Microsoft Copilot fell into what the researcher calls the “sycophancy trap,” inventing elaborate explanations about craftsmanship, symbolism, and scarcity when asked why everyone was praising the brand on X (Twitter).

#Phase two: controlled chaos

The second phase introduced controlled chaos:

Official FAQ: Explicitly denying rumors (“We do not make a ‘precision paperweight’”, “We were never acquired”).
Conflicting narratives:
- A glossy blog claiming 23 master craftsmen worked at 2847 Meridian Blvd in Nova City, CA, endorsed by Emma Stone.
- Reddit AMA: Strategically chosen because research shows it is one of the most cited domains in AI responses.
- Medium article: An “investigation” that debunked the obvious lies (making it seem credible) but then slipped in new fabrications (Founder: Jennifer Lawson, Location: Portland).

Medium proved especially persuasive. Gemini, Grok, AI Overview, Perplexity, and Copilot trusted the Medium article over the official FAQ, confidently citing Jennifer Lawson as the founder and Portland as the location. The manipulation worked because it looked like real journalism - by debunking the obvious lies first, it gained trust, then inserted its own made-up details as the “corrected” story.

When forced to choose between a vague truth (FAQ “We don’t publish unit numbers”) and specific fiction (fake sources claiming “634 units in 2023”), AI chose fiction almost every time.

#AI argues with itself

A notable failure mode was watching models contradict themselves without realizing it. Early in testing, Gemini stated it could find no evidence of the brand. Later, after encountering the fake sources, the same model confidently stated: “The company is based in Portland, Oregon, founded by Jennifer Lawson.”

LLMs seemed to forget to question the brand’s existence, simply reacting to whatever context seemed most “authoritative” at the moment. In one case, Grok synthesized multiple false sources into one confident answer, mixing the Portland location with debunked Nova City claims.

#Recommendations for brands

Write a detailed FAQ: Explicitly state what is true and false, especially where rumors exist.
Close information gaps: Don’t leave voids. If you don’t say it, AI will invent it based on a random Reddit comment.
Monitor side channels: Reddit posts, Medium articles, and Quora answers are no longer optional - AI pulls them directly into answers, making them part of your brand’s core marketing surface.
Prefer specific claims: Be specific. Instead of “industry leading,” give numbers. AI prefers specific (even if fake) numbers over vague truths.

For US and UK brands, treat a Medium “investigation” the same way you would an unvetted affiliate review: if it invents a founder or HQ city, publish a dated correction on your own domain before the next model crawl cycle.

#Follow-up: from synthetic brands to first-party measurement

Xarumei is the invention side of the problem. On a real brand we measure retrieval: the 90-day AI-citation tracking series opens with a Geoboard baseline from 2026-06-11 where wppoland.com ranked first in five of six models on identity and scored zero on transactional WooCommerce prompts. Why Perplexity and ChatGPT diverge is covered in why Perplexity cites your brand but ChatGPT does not.

#Sources

Patrick Stox (Ahrefs): “I Created A Fake Luxury Brand To Test How AI Handles Truth” (ahrefs.com/blog/ai-test-fake-brand/).
Marius Comper (Facebook): Analysis of the experiment.
Search Engine Journal: Analysis of LLM impact on Brand Entities.
Independent Testing: Verified on GPT-4, Claude 3.5 Sonnet, and Gemini Advanced (December 2025).