Afternoon BriefAI Search & Discovery

52% of AI Citations Go to 3 Content Formats. Is Yours One of Them?

New research across 75,000 AI answers reveals exactly which content formats get cited — and which get ignored. Here's the format-intent audit that tells you where your pages stand.

Christian Lehman|
52% of AI Citations Go to 3 Content Formats. Is Yours One of Them?

Most B2B marketing teams are still publishing content based on what worked in traditional search. The problem is that AI engines do not cite content the way Google ranked it. They select specific formats based on what the user is trying to do — and the mismatch between what you are publishing and what AI actually cites is measurable now.

Wix Studio's AI Search Lab just published the most granular citation study to date: 75,000 AI answers and more than 1 million citations across ChatGPT, Google AI Mode, and Perplexity. The findings are specific enough to act on this week. Christian Lehman breaks down the data and the audit steps that follow from it.

Three formats capture 52% of all AI citations

The study found that listicles (21.9%), articles (16.7%), and product pages (13.7%) account for over half of all AI citations. Everything else — how-to guides, homepages, comparison pages, discussion threads — splits the remaining 48% across a long tail of formats that individually capture single-digit percentages.

That concentration matters. If your content library is heavy on formats outside those three, a significant share of your pages are structurally invisible to AI engines — not because the information is wrong, but because the format does not match how AI selects sources.

A separate study from Kevin Indig analyzing 1.2 million ChatGPT responses reinforces how aggressive the filtering is: ChatGPT retrieves six times more pages than it actually cites. 85% of retrieved pages get discarded before the user ever sees them. The selection bottleneck is real, and format is one of the primary filters.

Intent determines which format wins

The Wix Studio data shows that query intent — not industry, not AI model — is the strongest predictor of which format gets cited.

Query intentTop cited formatCitation share
Informational ("how does X work")Articles45.5%
Commercial ("best X for Y")Listicles40.9%
Transactional ("buy X," "pricing for X")Product/category pages~40% combined

Christian Lehman's take: this is the most actionable finding in the study. It means you can map your existing content library against buyer intent stages and immediately see where you have format-intent mismatches — pages that are technically about the right topic but packaged in a format AI engines will skip for that intent type.

For example: if you have a long-form article targeting a commercial comparison query ("best project management tools for remote teams"), AI engines are 2x more likely to cite a listicle for that intent. The article might rank in traditional search, but it is being passed over in AI answers where listicles dominate the commercial citation slot.

The third-party preference is measurable

One of the more striking findings from the Wix Studio research: in professional services, third-party listicles captured 80.9% of citations compared to just 19.1% for self-promotional lists.

This aligns directly with what MR research has documented — earned media produces 325% more AI citations than brand-owned distribution of the same content. AI engines treat independent editorial comparisons as more trustworthy than brand-produced rankings, which makes the source of the content as important as the format.

The practical implication: brands that are only producing first-party listicles and comparison pages are hitting a ceiling. The higher-citation path is getting your brand into third-party editorial lists and comparison articles — publications that AI engines already trust. That is an earned media problem, not a content production problem.

Model-level differences change your platform strategy

Each AI platform weights formats differently:

PlatformTop formatSecond formatStandout pattern
ChatGPTArticles + informationalListiclesLeans heavily toward long-form, authoritative content
Google AI ModeBalanced across formatsListiclesMost even distribution of any platform
PerplexityListiclesProduct pages17% of citations from discussions (Reddit, forums)

If your team is only optimizing for one AI platform, you are likely over-indexing on that platform's format preference while being invisible on others. ChatGPT rewards depth and authority. Perplexity rewards community signal and recency. Google AI Mode rewards structural diversity. A single content format cannot cover all three.

The format-intent audit you can run this week

Christian Lehman recommends a five-step audit that maps your existing content against the citation data:

  1. Export your top 50 pages by traffic or business value. These are the pages that matter most for AI citation.

  2. Classify each page by content format. Is it an article, a listicle, a product page, a how-to guide, a comparison, or something else? Be honest about what AI would see — not what your CMS calls it.

  3. Tag each page by primary buyer intent. Is the target query informational, commercial, or transactional? Use your search console data or the queries you are actually trying to capture.

  4. Cross-reference format against intent. For each page, check: does the format match the citation-dominant format for that intent type? An article targeting a commercial query is a mismatch. A product page targeting an informational query is a mismatch.

  5. Prioritize rewrites for the highest-value mismatches. You do not need to restructure everything. Start with the pages where the business value is highest and the format-intent mismatch is clearest. A commercial-intent page restructured as a listicle, or an informational page restructured with answer-first architecture, can shift citation eligibility without changing the underlying topic.

The Princeton GEO research (Aggarwal et al., SIGKDD 2024) established that adding statistics and citing credible sources improves AI visibility by 30-40%. Combine that with format-intent alignment and you are stacking the two highest-leverage citation factors the research has identified.

Why format alone is not enough

Format gets you into the selection pool. What gets you cited is whether the content comes from a source AI engines already trust.

The Ahrefs analysis of 75,000 brands found that brand web mentions — the output of earned editorial coverage — correlate 0.664 with AI visibility. Backlinks correlate 0.218. That is a 3x gap. The Muck Rack Generative Pulse analysis of over one million AI prompts confirmed that 82% of links cited by AI systems come from earned media sources.

The data points to a compounding relationship: the right format makes a page extractable, but earned authority — placements in publications AI engines already index and trust — makes it citable. Both matter. Neither is sufficient alone.

This is the infrastructure layer that Machine Relations, coined by Jaxon Parrott in 2024, names as the defining challenge. The same earned media placements that build credibility with human readers are the citations AI engines pull when a buyer asks who leads a category. The format-intent audit tells you whether your content is structured to be selected. The earned media footprint determines whether it is trusted enough to be cited. Machine Relations is the operating framework that connects both: earned authority flows through trusted publications into AI citation, and structuring that content for extractability is how the citation actually lands.

If you want to see where your brand currently stands on both dimensions — format readiness and earned authority — the visibility audit shows exactly how AI engines are representing you today and what is missing.

Related Reading