Afternoon BriefAI Search & Discovery

AI Is Citing AI: The Source Authority Crisis That Changes Who Gets Retrieved

AI engines are citing AI-generated content at scale. GhostCite found every major LLM hallucinates citations at 14% to 95%. Fabricated biomedical citations increased 12-fold since 2023. Jaxon Parrott breaks down what the self-citation crisis means for brand visibility and why source architecture is the only defense.

Jaxon Parrott
Jaxon ParrottJun 15, 2026
AI Is Citing AI: The Source Authority Crisis That Changes Who Gets Retrieved

AI is citing AI. Not as an edge case. Structurally. GhostCite's analysis of 2.2 million academic citations found every major LLM hallucinates references at rates from 14% to 95%. Fabricated citations in biomedical literature increased 12-fold since 2023. The source authority layer most brands depend on for AI visibility is contaminated, and the contamination is accelerating. The brands that survive this are the ones whose source architecture traces back to real, human-created editorial proof.

Every Major LLM Hallucinates Citations

I want to be precise about the scale of this problem because the numbers are worse than the headlines suggest.

The GhostCite study benchmarked 13 LLMs on citation generation tasks and found hallucination rates ranging from 14.23% to 94.93%. That is not a typo. The best-performing model still fabricated roughly one in seven references. The worst fabricated nearly all of them.

The researchers then analyzed 2.2 million citations from 56,381 papers at AI/ML and security venues published between 2020 and 2025. They found 1.07% of papers contained invalid citations. That sounds small until you see the trend line: an 80.9% increase in papers with invalid citations during 2025 compared to prior years. The problem is not stable. It is compounding.

Here is the behavioral layer that makes it worse. A survey of 97 researchers found that 87.2% use AI-powered tools in their workflows, but only 23.3% of peer reviewers thoroughly examine references. That leaves 76.7% of reviewers not catching the fabrications AI tools introduce. The contamination has an open door.

The 12-Fold Increase Nobody Is Talking About

The academic contamination is not confined to AI research papers. It is spreading through the literature brands and AI engines depend on for authoritative claims.

A Columbia University audit of 2.47 million biomedical papers in PubMed Central examined 97.1 million references and flagged 4,046 fabricated citations across 2,810 papers. The rate jumped from approximately 4 per 10,000 papers in 2023 to 56.9 per 10,000 in early 2026. That is a greater than twelvefold increase in two years.

The timing is not a coincidence. ChatGPT launched in late 2022. The effects became visible in PubMed Central around mid-2024, consistent with typical academic publication timelines of 100 to 200 days. What you are seeing now is the first wave reaching the surface.

98.4% of flagged papers remained uncorrected at the time of the audit. That is the part that matters for AI visibility. Those fabricated citations are still live, still indexed, and still available for AI engines to retrieve and cite downstream.

How Contaminated Citations Launder Themselves

This is where the structural risk becomes clear. Contaminated citations do not stay in one paper. They propagate.

Research on citation contamination found that 41.5% of researchers copy-paste references without verification. In biomedicine, 94.6% of post-retraction citations never acknowledge the retraction. The GhostCite study identified the same fabricated citation appearing across up to 16 separate papers at major conferences.

The mechanism is simple. Paper A cites a fabricated source. Paper B cites Paper A. Paper C cites both. AI engines crawl all three and treat the cross-referenced claim as validated. The fabrication becomes indistinguishable from verified research. Not because it was proven, but because it was repeated.

I built AuthorityTech on the premise that AI engines retrieve what they trust, and trust is earned through source authority. What these studies prove is that source authority itself is being manufactured at scale by the same AI tools brands use to generate content. The loop is now closed: AI generates citations, AI retrieves those citations, AI cites them to the next user. Each cycle makes the fabrication look more legitimate.

What This Means for Brand Visibility in AI Search

If your brand's AI visibility depends on being cited in AI-generated answers, you need to understand what those answers are built on.

Machine Relations research across 55,936 queries found that LLM search engines return an average of 4.3 URLs per response compared to 10.3 for traditional search. That compression is the operating constraint. When AI engines select fewer than half the sources traditional search does, the quality filter gets tighter. The sources that survive are the ones AI engines can verify through external proof: editorial authority, named authorship, traceable provenance.

The contamination crisis amplifies this. As the citation pool fills with fabricated and self-referential content, AI engines will increasingly weight the signals that distinguish real sources from synthetic ones. Named authors with verifiable track records. Publications with editorial review processes. Claims supported by traceable, original reporting.

That is exactly what earned media provides. Muck Rack's May 2026 Generative Pulse study found that 84% of AI citations come from earned media. Paid and advertorial content accounts for 0.3%. The editorial verification layer that makes earned media trustworthy is the same layer that makes it resistant to citation contamination.

Source Custody Is the Only Durable Defense

I coined Machine Relations in 2024 after documenting the link between earned media credibility and AI citation eligibility. The self-citation crisis makes the case stronger, not weaker. If the citation ecosystem is contaminated, the brands that win are the ones that can prove their claims trace back to original, human-created, editorially verified sources.

The concept emerging in academic publishing is source custody: a provenance record proving what was verified, when, and where the original evidence lives. For brands, the equivalent is what I call citation architecture: the structural condition where your claims appear as sources in AI-generated answers because they are grounded in trusted third-party editorial proof, not because they were repeated enough times to look true.

Here is the test I run for every brand we work with at AuthorityTech. Can you trace every major claim in your AI-visible content back to a named human author, a real publication with an editorial process, and original reporting or data? If the answer is no, some percentage of your AI visibility is built on a foundation that the contamination crisis will erode.

The brands publishing AI-generated content that cites other AI-generated content are building on quicksand. The brands earning coverage in publications that AI engines already trust, with named authors and traceable editorial processes, are building on bedrock. The gap between those two positions is about to become the most visible divide in B2B marketing.

You are either in the source chain or you are noise the next model update filters out.

FAQ

What does it mean that AI is citing AI?

AI engines retrieve and cite sources to answer user queries. When those sources were themselves generated by AI, or contain citations fabricated by AI tools, the result is a self-referential loop where machines cite machines. GhostCite's analysis found that all 13 major LLMs hallucinate citations at rates from 14% to 95%, and fabricated biomedical citations increased 12-fold since 2023. The contamination is entering the same citation pool AI search engines draw from when recommending brands to buyers.

How does the AI self-citation crisis affect brand visibility?

AI search engines return an average of 4.3 sources per response versus 10.3 for traditional search, based on Machine Relations research across 55,936 queries. That compression means source quality determines who gets cited. As the citation pool fills with fabricated and self-referential content, AI engines will increasingly favor sources with verifiable editorial authority: named authors, trusted publications, and traceable original reporting. Brands without that provenance layer lose the citation slots to brands that have it.

What is source custody and why does it matter for AI search?

Source custody is the discipline of maintaining a provenance record for every citation: what was verified, when, and where the original evidence lives. The concept was formalized in academic publishing ethics as a defense against citation contamination. For brands, it translates to citation architecture: ensuring your claims are grounded in earned media from trusted publications, not in AI-generated content that can be traced back to other AI-generated content. I built AuthorityTech's model around this principle because the brands that can prove source provenance are the brands AI engines will continue to trust.

Additional source context