AI SEO EXPERIMENTS

Every AI Citation Study Worth Reading in 2026 (Data From 420K+ Pages)

2026-03-24

73% of people use web search to discover new products and services. But the search results they see are increasingly written by AI, not curated by algorithms. What determines whether your content gets cited?

We read every major study on AI citation behavior published since 2024. This post is the complete research roundup: what was measured, what was found, and what it means for anyone trying to get cited by ChatGPT, Perplexity, Gemini, or Claude.

No speculation. No "tips." Just data.

🔢 THE KEY NUMBERS (AT A GLANCE)

Before we break down each study, here are the statistics that matter most:

Stat	Source	What It Means
19,556 queries analyzed	Lee (2026a)	Largest public dataset of AI citation behavior across platforms
400,000+ pages analyzed	Sellm (2025)	Cited pages average 13.75 list sections vs. fewer for uncited
0% Reddit citations via API	Lee (2026b)	Reddit is invisible through API access, despite massive training influence
17-44% Reddit citations in web UI	Lee (2026b)	Web interfaces cite Reddit heavily, APIs do not
Up to 40% visibility boost	Aggarwal et al. (2024)	GEO strategies can significantly improve generative engine visibility
40% improvement (targeted) vs. 25% (generic)	Tian et al. (2025)	Diagnostic optimization outperforms one-size-fits-all approaches
1.4% platform overlap	Lee (2026a)	Platforms almost never cite the same URL for the same query
44.2% of citations from first 30% of content	Lee (2026a)	Front-loading matters more than total word count
rho = -0.02 to 0.11	Lee (2026a)	Google rank has zero meaningful correlation with AI citation
7 significant predictors	Lee (2026a)	Page-level features that survived FDR correction

The Bottom Line: The research converges on one finding: AI citation is governed by query intent and content structure, not by traditional SEO signals. Google rank, domain authority, and backlink counts are essentially irrelevant to whether an AI platform cites your page.

🔬 STUDY 1: QUERY INTENT, NOT GOOGLE RANK (LEE, 2026a)

Paper: "Query Intent, Not Google Rank: What Best Predicts AI Citation Behavior" Dataset: 19,556 Google Autocomplete queries across 8 industry verticals DOI: 10.5281/zenodo.18653093

This is the largest public study of AI citation behavior to date. Lee tested a simple but important question: does your Google ranking predict whether AI platforms will cite you?

The answer is no. Across 19,556 queries tested on ChatGPT, Perplexity, Claude, and Gemini, the Spearman correlation between Google rank and AI citation ranged from rho = -0.02 to 0.11. All values were statistically non-significant. Ranking #1 on Google gave you no advantage over ranking #50 when it came to AI citations.

What Does Predict Citation?

The study identified a two-level model:

Level 1: Query Intent (the filter). Intent distributions varied significantly by vertical (chi-squared(28) = 5,195, p < .001, Cramer's V = 0.258). The five intent categories and their query shares:

Intent Type	Share	Typical Citation Sources
Informational	61.3%	Wikipedia, .gov/.edu, tutorials
Discovery	31.2%	Review aggregators, YouTube, listicles
Validation	3.2%	Brand sites, Reddit (web UI only)
Comparison	2.3%	Publisher/media, review sites
Review-seeking	2.0%	YouTube, TechRadar/PCMag, Reddit

Level 2: Page Features (the selector). Among pages matching the correct intent, a logistic regression using 7 page-level features achieved AUC = 0.594. The 7 statistically significant predictors (after Benjamini-Hochberg FDR correction):

Internal link count (OR = 2.75, strongest positive predictor)
Content-to-HTML ratio (OR = 1.29)
Schema count (OR = 1.21)
Self-referencing canonical (OR = 1.92)
Word count (cited median: 2,582 vs. not cited: 1,859)
Schema presence (OR = 1.69)
Total link count (OR = 0.47, negative when external links dominate)

A critical finding: adding intent features to the page-level model provided zero additional predictive power (likelihood ratio p = .78). Intent decides the pool. Page features decide the winner within that pool.

Platform Overlap Is Nearly Zero

Only 1.4% of cited URLs appeared across multiple AI platforms for the same query. Each platform maintains its own retrieval pipeline and selects different sources. Optimizing for "AI search" as a monolith is a mistake. You need to understand each platform's architecture.

For the full breakdown of the 7 predictors and what they mean for your content, see our complete GEO guide. For the query intent research specifically, see Query Intent and AI Citation.

The Bottom Line: Stop obsessing over Google rankings as a proxy for AI visibility. They measure different things. Focus on matching your content type to query intent, then optimize the 7 page-level features that actually predict citation.

📊 STUDY 2: THE 400K-PAGE CITATION ANALYSIS (SELLM, 2025)

Paper: "ChatGPT Citation Analysis" Dataset: 400,000+ pages analyzed for structural features

The Sellm (2025) industry study is the largest structural analysis of pages that ChatGPT cites. While the methodology differs from Lee's controlled experiment (industry analysis vs. academic study), the directional findings align.

Key findings from 400K+ pages:

Feature	Cited Pages (avg)	Insight
List sections	13.75	Cited pages use substantially more structured lists
Content in first 30%	44.2% of citations reference this	Front-loading is not optional
Structured data presence	Significantly higher	Aligns with Lee's schema findings

The 13.75 list sections finding suggests AI models preferentially extract from pages with scannable, structured formats. This aligns with Aggarwal et al. (2024) showing that "citing sources" and "adding statistics" were among the most effective optimization strategies.

The 44.2% front-loading statistic is equally important. If your key information is buried below the fold or locked behind "read more" expanders, AI models may never reach it.

The Bottom Line: Structure your content with lists, tables, and clear headers. Put your most citation-worthy information in the first 30% of the page.

👻 STUDY 3: THE REDDIT SHADOW CORPUS (LEE, 2026b)

Paper: "Reddit Doesn't Get Cited (Through the API): Training Data Influence, Access-Channel Divergence, and the Shadow Corpus in AI Brand Recommendations" DOI: 10.5281/zenodo.18679003

This study reveals one of the most important findings in AI search research: Reddit's influence on AI outputs is massive, but almost entirely invisible through standard measurement.

The Access-Channel Split

Lee tested the same brand recommendation queries through two channels: the API (programmatic access) and the web UI (browser-based). The results diverged dramatically:

Metric	API	Web UI
Reddit citation rate	0%	17-44%
Source attribution	None for Reddit	Frequent Reddit links
Brand recommendation similarity	High overlap	High overlap

Through the API, Reddit was cited exactly 0% of the time. Through the web UI, Reddit appeared in 17% to 44% of responses depending on the query category. Yet the actual brand recommendations were similar across both channels.

The Shadow Corpus Mechanism

Reddit content has been absorbed into LLM training data. Through the API (no web search), models draw on internalized Reddit sentiment but cannot cite a source learned during pre-training. Through the web UI (with search enabled), models find and cite live Reddit threads that match already-formed opinions.

The brand correlation (Spearman rho = 0.554) between Reddit mention frequency and AI recommendation probability confirms training data influence. Reddit is not being cited. It is being channeled. If your brand is discussed negatively on Reddit, that sentiment is likely already baked into AI outputs.

For the complete analysis of Reddit's invisible influence, see Reddit Training Data Influence. For understanding when AI platforms use web search at all, see When Does ChatGPT Search?.

The Bottom Line: Reddit influences AI recommendations through training data absorption, not through real-time citation. Monitor your Reddit presence as an upstream input to AI outputs, even though you will never see Reddit cited in API responses.

🧪 STUDY 4: THE ORIGINAL GEO BENCHMARK (AGGARWAL ET AL., 2024)

Paper: "GEO: Generative Engine Optimization" Published: KDD 2024 DOI: 10.48550/arXiv.2311.09735

Aggarwal et al. introduced the term "Generative Engine Optimization" and created GEO-bench, the first systematic benchmark for evaluating content optimization strategies in generative search engines.

Core Findings

The study tested 9 optimization strategies across multiple domains, measuring visibility changes in generative engine responses:

Strategy	Visibility Change	Domain Sensitivity
Citing sources	Up to +40%	High (works best in factual domains)
Adding statistics	+30-40%	High (factual and technical content)
Quotation addition	+15-25%	Moderate
Keyword stuffing	Minimal/negative	Low (generic harm)
Fluency optimization	+5-10%	Low (marginal improvement)

The headline finding: targeted optimization strategies can boost visibility by up to 40%. But the effectiveness is heavily domain-dependent. What works for legal content may not work for cooking recipes.

Why This Paper Matters

GEO-bench established the first repeatable framework for measuring AI citation optimization. Before this paper, all "AI SEO" advice was speculative. Aggarwal et al. gave the field its first empirical foundation.

For a practical walkthrough of implementing GEO strategies, see our Generative Engine Optimization Guide.

🛒 STUDY 5: E-GEO AND DOMAIN-AGNOSTIC OPTIMIZATION (BAGGA ET AL., 2025)

Paper: "E-GEO: A Testbed for Generative Engine Optimization in E-Commerce"

Bagga et al. extended the GEO framework to e-commerce, testing whether optimization patterns from general web content also apply to product pages. The core finding: a stable, domain-agnostic optimization pattern exists across product categories.

Optimized content rewrites consistently outperformed unoptimized versions across diverse e-commerce categories
The optimization effect was not limited to a single product type or price range
Structural features (clear specifications, comparison-ready formatting) were more predictive than keyword density

This matters because GEO is not a niche tactic for blog posts. Product pages, service descriptions, technical documentation, and FAQ pages all benefit from the same structural principles. Rather than developing separate strategies per industry, practitioners can apply a core set of structural improvements across their entire site.

🔧 STUDY 6: DIAGNOSTIC GEO (TIAN ET AL., 2025)

Paper: "Diagnosing and Repairing Citation Failures in Generative Engine Optimization"

Tian et al. took a different approach to GEO. Instead of testing generic optimization strategies, they developed a diagnostic framework that identifies why a specific page fails to get cited and prescribes targeted repairs.

Their system, AgentGEO, diagnoses why a page fails to get cited, prescribes a targeted fix, and measures the improvement. The results:

Approach	Average Improvement	Long-Tail Impact
Generic GEO strategies	~25%	Can actually harm visibility
AgentGEO (diagnostic)	~40%	Preserves or improves visibility

The 40% vs. 25% gap is significant. But even more important is the long-tail finding: generic optimization strategies can reduce visibility for niche, long-tail queries while improving it for head terms. AgentGEO avoided this problem by tailoring fixes to each specific failure.

Different pages fail for different reasons: missing structured data, wrong intent, unsupported claims. Each failure mode requires a different fix. For understanding how AI platforms handle consensus in citations, see our AI Consensus research.

The Bottom Line: Generic optimization gets you 25%. Diagnostic optimization gets you 40%. The difference comes from understanding why your specific page is not being cited, not just applying a checklist.

🏗️ STUDY 7: HOW TO DOMINATE AI SEARCH (CHEN ET AL., 2025)

Paper: "Generative Engine Optimization: How to Dominate AI Search"

Chen et al. provided a practitioner-oriented synthesis of GEO strategies, testing which combinations produce the best results. Their findings reinforce the cross-study consensus: content substance (authoritative sourcing, comprehensive coverage), structural clarity (headers, tables, clear sections), and platform-specific tuning all consistently improved visibility. This confirms Lee's (2026a) finding of 1.4% citation overlap, meaning platform-specific strategies are not optional.

⚡ PLATFORM ARCHITECTURE: WHY "AI SEARCH" IS NOT ONE THING

One of the most important insights from the combined research is that AI platforms use fundamentally different architectures for content retrieval. Understanding these differences is essential for targeted optimization.

Platform	Architecture	Content Discovery	Key Implication
ChatGPT	Live fetching	Fetches pages via Bing during conversations	Fresh content accessible immediately if Bing-indexed
Claude	Live fetching	Claude-User bot fetches on demand	Respects robots.txt, only fetches when needed
Perplexity	Pre-built index	PerplexityBot crawls proactively	Strong freshness bias (3.3x fresher than Google for medium-velocity topics)
Google AI Mode	Google Search	Uses Googlebot-crawled content	Inherits Google's authority signals
Gemini	Google Search	No AI-specific crawlers identified	Grounds through Google's internal search

For fetching platforms (ChatGPT, Claude): server-side rendering and clean HTML matter because the bot reads your page in real time. For indexing platforms (Perplexity): freshness signals (dateModified, sitemap lastmod) matter because the bot pre-crawls and serves from its index. For Google-based platforms (AI Mode, Gemini): traditional Google SEO provides the foundation layer.

For a detailed comparison of platform citation behavior, see ChatGPT vs Perplexity vs Gemini. For strategies on consistent AI visibility, see How to Consistently Rank in AI.

📈 WHAT THE RESEARCH AGREES ON (CROSS-STUDY CONSENSUS)

Despite different methodologies, datasets, and research questions, these studies converge on several findings:

"The strongest predictor of AI citation is whether your content matches the query intent and presents information in a structured, extractable format. Traditional ranking signals are largely irrelevant." (Synthesis of Lee 2026a, Aggarwal 2024, Chen 2025)

1. Structure beats authority. Pages with structured lists, tables, comparison formats, and clear headers get cited more. Backlinks and domain authority do not predict AI citation. (Lee 2026a, Sellm 2025, Aggarwal 2024)

2. Front-loading is critical. 44.2% of citations reference content from the first 30% of a page (Sellm 2025). The 7 page-level predictors in Lee (2026a) include word count but not "depth of content at the bottom of the page."

3. Platform overlap is negligible. At 1.4% URL overlap (Lee 2026a), optimizing for "AI search" as a single channel is ineffective. Each platform needs its own strategy.

4. Generic optimization has limits. Tian et al. (2025) showed that generic strategies plateau at ~25% improvement and can harm long-tail visibility. Diagnostic, page-specific optimization reaches 40%.

5. Training data creates invisible influence. Reddit's 0% API citation rate alongside rho = 0.554 brand correlation (Lee 2026b) proves that training data shapes AI outputs in ways that citation tracking cannot measure.

6. Schema type matters, not just presence. Product schema (OR = 3.09), Review schema (OR = 2.24), and FAQPage schema (OR = 1.39) help. Article schema actually hurts (OR = 0.76). Generic "has schema" is not significant (Lee 2026a).

7. Domain-agnostic patterns exist. The same structural optimization principles work across e-commerce (Bagga 2025), general web content (Aggarwal 2024), and multi-vertical queries (Lee 2026a).

❓ FREQUENTLY ASKED QUESTIONS

Is there a single most important factor for AI citations?

Query intent match. Lee (2026a) found that intent completely determines which content pool is eligible. A comparison article will not get cited for an informational query, regardless of optimization.

Do backlinks or domain authority matter for AI citations?

No. Across 19,556 queries (Lee 2026a), Google rank showed zero meaningful correlation with AI citation (rho = -0.02 to 0.11). AI platforms evaluate page content directly, not link graphs.

Should I optimize for all AI platforms at once?

Start with cross-platform fundamentals (lists, tables, front-loaded info, schema markup), then add platform-specific optimizations. With only 1.4% citation overlap (Lee 2026a), a single strategy is inefficient. See how to consistently rank in AI.

How much can GEO improve my visibility?

25% to 40% depending on approach. Generic strategies produce ~25% (Tian et al., 2025). Targeted, diagnostic approaches achieve ~40%. Aggarwal et al. (2024) demonstrated up to 40% with specific strategies like adding citations and statistics.

Does Reddit activity affect AI recommendations even if Reddit isn't cited?

Yes. Lee (2026b) found Reddit mentions correlate with AI brand recommendations at rho = 0.554, even at 0% API citation rate. Reddit content has been absorbed into training data, creating a "shadow corpus" that influences outputs without attribution.

🎯 PRACTICAL IMPLICATIONS

Priority order for applying this research:

Audit content against query intent (free: AI Visibility Quick Check)
Front-load key information into the first 30% of each page
Add structural elements: lists, comparison tables, FAQ sections
Implement the right schema types: Product, Review, FAQPage (not Article)
Fix link architecture: high internal navigation links, low external links
Monitor platform-specific performance across ChatGPT, Perplexity, Claude, Google AI Mode
Track Reddit presence as an upstream input to AI recommendations

For a complete implementation guide, see our GEO guide.

📚 REFERENCES

Aggarwal, P., Murahari, V., Rajpurohit, T., Kalyan, A., Narasimhan, K., & Deshpande, A. (2024). "GEO: Generative Engine Optimization." KDD 2024. DOI
Bagga, P. S., Farias, V. F., Korkotashvili, T., & Peng, T. Y. (2025). "E-GEO: A Testbed for Generative Engine Optimization in E-Commerce." Preprint.
Chen, M. L., Wang, X., Chen, K., & Koudas, N. (2025). "Generative Engine Optimization: How to Dominate AI Search." Preprint.
Lee, A. (2026a). "Query Intent, Not Google Rank: What Best Predicts AI Citation Behavior." Preprint v5. DOI
Lee, A. (2026b). "Reddit Doesn't Get Cited (Through the API): Training Data Influence, Access-Channel Divergence, and the Shadow Corpus in AI Brand Recommendations." Preprint. DOI
Sellm (2025). "ChatGPT Citation Analysis." Industry report (400K+ pages analyzed).
Tian, Z., Chen, Y., Tang, Y., & Liu, J. (2025). "Diagnosing and Repairing Citation Failures in Generative Engine Optimization." Preprint.
Wen, Y., Zhang, N., Yuan, H., & Chen, X. (2025). "Position: On the Risks of Generative Engine Optimization in the Era of LLMs." Preprint.