Your Pages Score Fine on the AI Audit. They Still Don't Get Cited. Here's the Gap.
New research: pages scoring 0.70+ on the 16-pillar GEO audit hit a 78% cross-engine citation rate. The paper's own finding explains why most pages still miss. The fix isn't more optimization.
You ran the audit. You fixed the metadata. You updated the schema markup, got the freshness timestamps right, cleaned up the heading hierarchy. You hit 12 out of 16 pillars. Your GEO score lands above 0.70.
You still don't show up in Perplexity answers for your category.
This is the conversation I keep having with operators who are doing the work and not seeing the results. The September 2025 Berkeley/arXiv study that introduced the GEO-16 framework is being widely cited for its headline thresholds — and almost nobody is quoting the finding buried in the discussion section that explains exactly why scoring well isn't enough.
What GEO-16 actually found
The paper is called "AI Answer Engine Citation Behavior: An Empirical Analysis of the GEO-16 Framework" by Arlen Kumar and Leanid Palkhouski. They ran 70 B2B SaaS product-intent prompts across Brave, Google AI Overviews, and Perplexity, harvested 1,702 citations, and audited 1,100 unique URLs against 16 on-page quality signals across six principle groups: Metadata & Freshness, Semantic HTML, Structured Data, Authority & Trust, Evidence & Citations, and Internal/External Linking.
The operating threshold the paper established: a page needs a GEO score of at least 0.70 (normalized across 16 pillars) and at least 12 pillar hits to enter the citation-likely zone.
The numbers behind that threshold:
| GEO Score + Pillar Hits | Cross-engine citation rate |
|---|---|
| Below threshold (G < 0.70 or < 12 hits) | Low (baseline) |
| G ≥ 0.70, ≥ 12 pillar hits | 78% |
| Cross-engine cited pages vs. single-engine | 71% higher quality scores |
The pillar correlations tell you where to focus first. Metadata & Freshness (r=0.68) is the strongest signal. Semantic HTML (r=0.65) second. Structured Data (r=0.63) third. The marketing vertical odds ratio for citation from the paper's logistic model: 1.9 (95% CI [1.3, 2.7]) — meaning marketing-category B2B content that clears the quality threshold gets cited at nearly double the baseline rate. Odds ratio for overall page quality on citation: 4.2 (95% CI [3.1, 5.7]).
This is solid empirical work. Most operators reading the headlines are treating these thresholds as a finish line.
The paragraph most people skip
Here's what the paper actually says in its discussion section:
"Our findings reaffirm that on-page quality signals are crucial for AI-engine discoverability. However, recent comparative research emphasises that generative engines heavily weight earned media and often exclude brand-owned and social platforms. This implies that even high-quality pages may not be cited if they reside solely on vendor blogs. Publishers should therefore pursue a dual strategy: ensure on-page excellence (meeting GEO-16 thresholds) and secure coverage on authoritative third-party domains."
The researchers are telling you directly: the 78% citation rate applies to pages that already cleared the earned media filter. A page on your brand's own domain — no matter how well-optimized — starts from a structurally disadvantaged position because AI engines systematically deprioritize self-published content.
This isn't a marginal effect. The Stacker/Scrunch Citation Lift study measured 325% more citations for identical content distributed on third-party news sites versus the same content on brand domains alone. The content was the same. The domain made the difference. And Ahrefs' analysis of 1,000 ChatGPT citations found 65.3% of ChatGPT's top-cited pages came from DR80+ domains — reinforcing that the domain authority signal runs before the content quality signal in the citation decision stack.
Where most operators are stuck
The failure mode is predictable: a team runs the GEO-16 audit, identifies the pillars they're missing, fixes metadata, adds schema, updates timestamps. They see their on-page scores improve. Then they wait for citations that don't come.
There are two separate gates in the citation decision:
Gate 1 — Domain trust: Does the engine treat this domain as a credible, third-party source? Or is it brand-owned content the engine deprioritizes by default? This is why the Muck Rack "What Is AI Reading?" data consistently shows 85%+ of AI citations going to earned media — it's not that brand content is poorly written, it's that the engine treats the sourcing context differently.
Gate 2 — Content quality: Given the engine trusts the domain, does the page meet the structural standards for extraction? This is what GEO-16 measures.
Most operators are optimizing Gate 2 before they've gotten through Gate 1. The audit scores go up. The citations don't follow.
The Moz 2026 analysis of 40,000 Google AI Mode queries found 88% of cited URLs don't appear in the organic top 10. The traditional SEO signal — your page is in position 3 for a relevant keyword — has almost no overlap with whether AI engines cite it. The signals run in parallel, not in sequence.
What the fix actually looks like
The operators closing this gap are running two tracks at once:
Track 1: On-page structure (GEO-16 compliance)
From the paper's pillar data, the three moves with the highest citation correlation:
-
Metadata & Freshness (r=0.68): Publish date visible to humans,
datePublished/dateModifiedin JSON-LD, sitemaps updated, Last-Modified headers set. AI engines weight recency heavily — a technically sound page with a stale timestamp competes against content that signals it was updated this week. -
Semantic HTML (r=0.65): H1/H2/H3 hierarchy reflects actual content structure (not just visual styling), lists use proper
<ul>/<ol>/<li>markup, tables have<thead>/<tbody>and semantic headers. The paper found this because AI engines parse DOM structure to determine what a page is about before deciding whether to cite it. -
Structured Data (r=0.63): JSON-LD schema implemented correctly with
@type,author,datePublished,publisher. The engines need machine-readable confirmation of authorship and publication context — the signal that tells them whether to categorize the content as original reporting or brand marketing copy.
Track 2: Third-party placement (Gate 1 clearance)
This is the part the audit doesn't measure. Getting through Gate 1 means earning coverage in publications AI engines already trust. The GEO-16 paper explicitly cites the fact that "generative engines heavily weight earned media and often exclude brand-owned and social platforms." That's not a design flaw to work around — it's intentional. AI engines use publication credibility as a proxy for the editorial review process they can't replicate themselves.
The Princeton/Georgia Tech GEO research found that adding credible external citations to content increases AI citation probability — which means the pages AI engines trust are the ones that cite authoritative sources, not the pages that merely optimize their own technical structure. The two findings compound: earn a placement in a domain AI engines trust, then structure that content to hit the GEO-16 thresholds.
The practical order of operations: fix your on-page structure so that when your content does land in a trusted publication, it hits the GEO-16 thresholds on arrival. A poorly structured page in Forbes still underperforms. A well-structured page in Forbes that clears Gate 1 and Gate 2 together is what shows up at 78%.
The common shortcut mistake
Running the GEO-16 audit as a checklist without asking which pages it applies to. The study audited B2B SaaS content across 16 verticals — the pillars are calibrated for pages that will actually be discovered through product-intent queries. Auditing your homepage or "About" page against these thresholds is optimizing the wrong thing. The study's 70 prompts were explicitly designed to elicit vendor citations for commercial queries.
The pages worth auditing first: your category explainers, comparison pages, and any content targeting queries like "best [category] for [use case]." These are the pages AI engines are actively choosing between when a prospect asks for a recommendation.
The infrastructure beneath the tactic
The GEO-16 findings point to something more durable than a quarterly audit cycle. Cross-engine citations — pages cited by Brave, Google AIO, and Perplexity simultaneously — had 71% higher quality scores than single-engine citations. The pages that win across all three engines aren't just technically optimized; they've built the kind of third-party credibility that makes multiple AI systems independently trust them as a source.
That's what Machine Relations describes as the operating infrastructure: earned media placements in trusted publications create the corroboration layer AI engines use to decide what to surface when a prospect asks who leads a category. The on-page structure is necessary. The third-party authority is what makes it compound. An operator who runs the GEO-16 audit, hits the thresholds, and has earned coverage in three Tier 1 publications is in a structurally different position than one who only did half the work.
Run the audit. And then look at the column the audit doesn't have.
Related Reading
- Machine Relations for AI-Native Companies: How to Win the Citation War
- AI Visibility for Media & Entertainment Companies: The 2026 Earned Media Playbook
AuthorityTech's free visibility audit checks both layers — on-page structure and AI citation presence across the engines that matter for your category.