AI SEO EXPERIMENTS

Google AI Overview Ranking Factors: What Actually Gets Cited (2026 Research)

2026-03-24

The pages Google AI Mode cites are not the pages that rank #1 in traditional search. Domain-level trust gets you into the candidate pool. Page-level structure, earned media signals, and content extractability determine whether you actually get cited.

That is the single most important finding from analyzing 19,556 queries across eight industry verticals, crawling 4,658 pages, and tracking citation behavior across four AI search platforms (Lee, 2026). If you optimize for Google AI Overviews and AI Mode the same way you optimize for traditional blue links, you will miss the majority of citation opportunities.

This post covers every major ranking factor we have identified, validated against published research, and broken down by vertical so you can prioritize what matters for your specific situation.

🔬 THE RESEARCH BEHIND THESE FINDINGS

Before we get into factors, here is where the data comes from. Every claim in this post is backed by one or more of these sources:

Lee (2026): "Query Intent, Not Google Rank: What Best Predicts AI Citation Behavior." Analyzed 19,556 queries across 8 verticals, crawled 4,658 pages, tracked citations across ChatGPT, Perplexity, Gemini, and Claude. (DOI)
Aggarwal et al. (2024): "GEO: Generative Engine Optimization." Introduced the GEO framework and GEO-bench benchmark. Demonstrated that targeted optimization can boost AI search visibility by up to 40%, with effectiveness varying by domain. Published at KDD 2024. (DOI)
Chen et al. (2025): Found that AI search systems strongly prefer earned/third-party content over brand-owned content when selecting citation sources, establishing the "earned media bias" in generative search.

We verified both DOIs above via direct resolution. The Lee (2026) dataset is the largest publicly available AI citation dataset covering Google AI Mode specifically.

🏗️ HOW TRADITIONAL GOOGLE RANKING SIGNALS FEED INTO AI MODE

Google AI Mode runs on a two-stage architecture. Understanding this architecture is the key to understanding why traditional SEO still matters, but is no longer sufficient.

Stage 1: Retrieval. When a user asks a question in AI Mode, Google first searches its existing index, built by the same Googlebot that has powered Google Search for over two decades. Every traditional ranking signal applies here: PageRank, E-E-A-T, domain authority, crawl depth, Core Web Vitals, and content relevance scoring. If Googlebot has not crawled and indexed your content, Google AI Mode cannot cite it.

Stage 2: Synthesis. After retrieval returns candidate documents, the Gemini model reads those pages and generates a conversational response. Gemini decides which sources to cite based on a second set of factors: content comprehensiveness, factual density, structural clarity, and direct query relevance.

The critical insight: domain-level alignment between Google's top search results and AI Mode citations is substantial at 28.7% to 49.6%. But URL-level overlap is just 7.8% (Lee, 2026). Google AI Mode trusts the same domains that Google Search trusts, but Gemini often picks different specific pages from those domains.

The Bottom Line: Traditional SEO is your foundation layer. It gets your content into the candidate pool. AI-specific optimization is the amplification layer that gets your content selected from that pool. You need both. For the full architectural breakdown, see our guide to how Google AI Mode works.

📊 DOMAIN TRUST BIAS BY VERTICAL: WHO GETS CITED AND WHY

Google AI Mode does not distribute citations evenly. The degree of domain concentration varies dramatically by industry, and understanding your vertical's trust landscape determines your realistic optimization ceiling.

Citation Concentration Table

Vertical	Dominant Source Type	Authority Share	Openness to New Entrants
Health/Medical	Gov/edu domains (.gov, .edu)	38% gov/edu; 71% top 3 overall	Very low
Finance	Investopedia, NerdWallet, Forbes	64% top 3 share	Low
Education	.edu domains, Wikipedia	55% top 3 share	Low
Legal	.gov domains, FindLaw, Nolo	49% top 3 share	Moderate
B2B SaaS	Brand/vendor sites	48% brand-owned	Moderate
Technology	Official docs, Stack Overflow	42% top 3 share	Higher
E-commerce	Amazon, brand sites, review sites	39% top 3 share	Higher
Local Services	Yelp, Google Business, directories	29% top 3 share	Highest

Two verticals illustrate the extremes clearly.

In Health, government (.gov) and education (.edu) domains account for 38% of all AI Mode citations. Combined with established medical authority sites like WebMD, Mayo Clinic, and Cleveland Clinic, the top tier captures over 71% of health-related citations. If you are a newer health publisher, you are competing against decades of institutional trust baked into Google's ranking algorithms. That trust transfers directly into AI Mode's candidate pool.

In B2B SaaS, 48% of citations go to brand sites, meaning the vendor's own domain. The remaining 52% splits between review aggregators (G2, Capterra), analyst firms (Gartner, Forrester), and independent publishers. This is the one vertical where your own website is your single most powerful AI Mode asset. Invest in product pages, comparison content, and documentation on your own domain first.

The Bottom Line: Your optimization strategy must account for your vertical's trust architecture. In YMYL (Your Money, Your Life) verticals, breaking into the citation tier requires years of domain authority building. In technology, e-commerce, and local services, well-structured content from smaller domains can compete today. For strategies specific to AI Mode, see our AI Mode optimization guide.

🏆 THE EARNED MEDIA BIAS: WHY THIRD-PARTY MENTIONS MATTER MORE

One of the most consequential findings for content strategy: AI search systems demonstrate a strong, measurable preference for earned (third-party) content over brand-owned content when selecting citation sources.

Chen et al. (2025) documented this pattern across multiple AI platforms. When an AI system has a choice between citing a brand's own product page and citing an independent review, comparison article, or editorial mention of that brand, the AI system consistently prefers the independent source. This pattern held across verticals and query types.

The implications are significant:

Content Type	Citation Preference	Why AI Prefers It
Independent reviews	High	Third-party validation, perceived objectivity
Industry publications	High	Editorial oversight, multi-source synthesis
Comparison/roundup articles	High	Covers multiple options, matches discovery intent
Brand product pages	Moderate (B2B SaaS) to Low	Perceived bias, single-source perspective
Brand blog posts	Low to Moderate	Depends on factual density and originality
Press releases	Very low	Promotional framing, low information density

This earned media bias means that your PR and digital outreach strategy directly affects your AI Mode visibility. Every independent mention, review, or editorial feature creates a potential citation source that AI systems trust more than your own content.

The Bottom Line: Invest in earning mentions on third-party sites with existing domain authority. An independent review of your product on a trusted publication is worth more AI Mode citation potential than ten blog posts on your own domain. The one exception is B2B SaaS, where brand-owned content holds nearly half the citation share.

🎥 YOUTUBE CITATIONS: THE 137-CITATION OPPORTUNITY

Google AI Mode cited YouTube content 137 times across our 19,556-query dataset. YouTube accounted for 53% of all video-source citations, making it the dominant video citation source by a wide margin (Lee, 2026).

This is not surprising when you consider the architecture. Google owns YouTube and indexes its content at a depth no other video platform receives: auto-generated transcripts, chapter markers, descriptions, comments, and engagement metadata all feed into Google's index. When AI Mode needs to answer a "how to" or product comparison query, YouTube provides structured, verifiable content that Gemini can extract from with high confidence.

YouTube Citation Patterns by Query Type

Query Type	Share of YouTube Citations	Example Query
How-to queries	43%	"How to set up Google Analytics 4"
Review-seeking queries	31%	"Best project management software review"
Comparison queries	18%	"Notion vs Obsidian"
Informational queries	8%	"What is zero trust security"

What Makes a YouTube Video Citable

Video Characteristic	Citation Rate Multiplier
Has chapter markers	2.4x higher
Description over 500 words	1.8x higher
Published within 90 days	1.6x higher
Pinned comment with summary	1.3x higher

Chapter markers are the single highest-impact YouTube optimization for AI Mode. A 20-minute video with 8 chapter markers provides 8 discrete, addressable content blocks that Gemini can reference independently. Without chapter markers, Gemini has to treat the entire video as one undifferentiated block, which reduces its confidence in citing any specific claim from that video.

The Bottom Line: YouTube is not competing with your web pages for AI Mode citations. It is a parallel citation pathway that most competitors are ignoring entirely. If you produce video content, optimizing with chapter markers, detailed descriptions (500+ words), and accurate transcripts opens a second front of AI Mode visibility that requires relatively low incremental effort.

💬 THE REDDIT FACTOR: 44% WEB UI CITATION RATE

Reddit occupies a unique position in the AI citation ecosystem. It held 38.3% of Google's top search positions in our query set, making it one of the most visible domains feeding into AI Mode's candidate pool. But how Reddit gets cited depends entirely on how you access the AI platform.

Access Method	Reddit Citation Rate
AI platform APIs	0% (zero Reddit citations)
Web UI (browser-based)	8.9% to 44% depending on platform

Through API access, Reddit received zero citations across all platforms tested. Through web UI (browser-based) testing, Reddit appeared at rates up to 44% (Lee, 2026). This discrepancy likely reflects different content retrieval paths: API responses may bypass the web search layer where Reddit's organic rankings inject it into the candidate pool.

For Google AI Mode specifically, Reddit content enters the candidate pool because Reddit threads rank extremely well in Google Search. But AI Mode typically cites Reddit as supporting or anecdotal evidence, not as a primary authoritative source. When your content and a Reddit thread both address the same query, Gemini will prefer your page as the primary citation if your content is more comprehensive, better structured, and higher in factual density.

What This Means for Your Strategy

Monitor Reddit threads that rank for your target keywords. They are in the AI Mode candidate pool.
Create content that directly outperforms those threads in depth, structure, and data density.
Do not ignore Reddit as a content distribution channel. Authentic, helpful participation in relevant subreddits can create brand awareness signals that indirectly support citation.
Do not rely on Reddit as your primary citation strategy. The 0% API citation rate means programmatic AI integrations will never surface Reddit content.

The Bottom Line: Reddit is a significant competitor in the AI Mode candidate pool because of its Google Search rankings, not because AI systems inherently trust it. Create content that answers the same questions more thoroughly, and Gemini will prefer your page. For deeper analysis of how query intent drives citation patterns, see our query intent research.

📋 THE COMPLETE RANKING FACTORS CHECKLIST

Based on our research and the published literature, here are the statistically significant factors that predict whether a page gets cited in Google AI Overviews and AI Mode. These are ordered by measured effect size.

Page-Level Technical Factors

Factor	Effect	Odds Ratio	Action
Internal link count	Strongest positive predictor	2.75	Build topical clusters with strong internal linking
Self-referencing canonical	Strong positive	1.92	Ensure every target page has a self-referencing canonical tag
Schema markup presence	Positive	1.69	Implement relevant schema (Article, Product, FAQ, HowTo)
Content-to-HTML ratio	Positive	1.29	Remove template bloat; maximize actual content vs. code
Schema count	Positive	1.21	Layer multiple relevant schema types per page
Word count	Positive	Cited median: 2,582 vs. uncited: 1,859	Target 2,000+ words for citation-eligible pages
Excessive external links	Negative	0.47	Reduce outbound link density; keep external links purposeful

Source: Lee (2026), logistic regression on 4,658 crawled pages.

Content Structure Factors

Factor	Why It Matters	Implementation
Front-loaded conclusions	44.2% of citations come from the first 30% of content	Put key findings and answers above the fold
H2/H3 headers matching query patterns	Gemini uses headers to locate relevant sections	Mirror natural language questions in your subheadings
Comparison tables	Structured data is easier for Gemini to extract and cite	Include tables for any comparative or multi-option content
Discrete, citable claims	Factual density predicts citation selection	Place statistics, findings, and conclusions in their own paragraphs
Listicles and numbered steps	Clear sequential structure aids extraction	Use numbered lists for processes and ranked recommendations

Domain-Level Trust Factors

Factor	Influence	Notes
Google Search authority (E-E-A-T)	Gatekeeper for candidate pool entry	Domain-level alignment: 28.7% to 49.6%
Vertical-specific trust architecture	Determines citation ceiling	Health: 71% top 3 concentration; Local: 29%
Earned media presence	Strong positive signal	Third-party mentions preferred over brand-owned content
Topical authority depth	Positive	Sites covering a topic comprehensively outperform thin coverage

Query Intent Alignment

Query intent is the strongest aggregate predictor of citation source type, with a chi-squared(28) value of 5,195 (p < .001, Cramer's V = 0.258) (Lee, 2026). A product comparison page will never get cited for a definitional query, regardless of its technical quality.

Intent Type	Query Share	Best Content Format
Informational	61.3%	Guides, definitions, explainers
Discovery	31.2%	Listicles, "best of" roundups, reviews
Validation	3.2%	Brand pages, case studies, testimonials
Comparison	2.3%	Head-to-head tables, feature matrices
Review-seeking	2.0%	In-depth reviews, video reviews

The Bottom Line: The checklist above is ordered by measurable impact. Start with the page-level technical factors (canonical tags, internal links, schema) because they are the easiest to implement and have the largest individual effect sizes. Then address content structure and intent alignment. Domain-level trust is the hardest to change but has the highest ceiling impact.

🔄 GEO OPTIMIZATION: WHAT THE RESEARCH SAYS WORKS

The GEO (Generative Engine Optimization) framework introduced by Aggarwal et al. (2024) was the first systematic study of how to optimize content for AI-generated search responses. Their key findings:

Targeted optimization boosts visibility by up to 40% in generative engine responses. This is not a theoretical ceiling; it is a measured result from their GEO-bench benchmark.
Effectiveness varies significantly by domain. Strategies that work for technology content may underperform for health or legal content. A one-size-fits-all approach will leave performance on the table.
Black-box optimization methods work. You do not need to understand the internal workings of Gemini to optimize for it. Observable patterns in citation behavior are sufficient to guide strategy.

The practical implication: combine the GEO framework's domain-aware optimization with the page-level technical factors from our research. The technical factors (canonical tags, schema, internal links) get your page into the candidate pool. The GEO strategies (content structure, factual density, domain-specific formatting) increase the probability of selection from that pool.

For detailed implementation guidance, see our AI Mode optimization guide.

❓ FREQUENTLY ASKED QUESTIONS

Are Google AI Overview ranking factors different from regular Google ranking factors? Yes and no. The retrieval stage uses the same signals as traditional Google Search: domain authority, E-E-A-T, relevance scoring, and technical health. But the synthesis stage adds a second layer of evaluation that traditional SEO does not address. Factors like content extractability (can Gemini pull discrete facts from your page structure?), factual density (how many citable claims per paragraph?), and schema presence (does structured data help the model understand your content?) only matter for AI citations. You need traditional SEO as the foundation, plus AI-specific optimization on top.

How important is schema markup for Google AI Mode citations? Schema markup showed a statistically significant positive association with citation in our data, with an odds ratio of 1.69 for schema presence and 1.21 for schema count. Product schema specifically showed an OR of 3.09 in our expanded dataset. Schema helps Gemini understand what your page is about, what type of content it contains, and how to categorize the information. It is not the strongest individual factor (internal link count at OR 2.75 is stronger), but it is one of the easiest to implement and has no downside.

Why does Google AI Mode prefer third-party content over brand-owned pages? Chen et al. (2025) documented this "earned media bias" across multiple AI platforms. The likely mechanism is that AI synthesis models are trained to identify and prefer sources that appear objective and multi-perspective. A brand's own product page is inherently single-perspective. An independent review, comparison article, or editorial feature provides the kind of multi-source validation that AI systems interpret as higher-quality evidence. The exception is B2B SaaS, where brand sites capture 48% of citations because product documentation and feature pages are primary information sources that no third party can replicate.

Can I optimize my YouTube videos for Google AI Mode citations? Yes, and you should. YouTube accounted for 53% of all video-source citations in our data (137 total citations across 19,556 queries). The highest-impact optimization is adding chapter markers (2.4x citation rate multiplier). Follow that with detailed descriptions over 500 words (1.8x), publishing cadence within 90 days (1.6x), and pinned comments with content summaries (1.3x). Chapter markers matter most because they give Gemini structured access to specific segments, turning one long video into multiple discrete, citable content blocks.

How do I know if my content is in Google AI Mode's candidate pool? If your page appears in Google Search results for a query, it is in the candidate pool for AI Mode responses on that same query. Google AI Mode uses the same Googlebot-crawled index as regular search. Check Google Search Console for indexed pages and ranking positions. Then use our AI Visibility Quick Check to evaluate whether your indexed pages have the page-level characteristics (schema, internal links, content structure) that predict selection from the candidate pool. For a full comparison of how different AI platforms build their candidate pools, see our ChatGPT vs Perplexity vs Gemini analysis.

📚 REFERENCES

Aggarwal, P., Murahari, V., Rajpurohit, T., Kalyan, A., Narasimhan, K., & Deshpande, A. (2024). "GEO: Generative Engine Optimization." KDD 2024. DOI
Chen, W., et al. (2025). "Earned Media Bias in AI Search: How Generative Engines Prefer Third-Party Sources." Proceedings of SIGIR 2025.
Lee, A. (2026). "Query Intent, Not Google Rank: What Best Predicts AI Citation Behavior." Preprint v5. DOI
Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., & Bevilacqua, M. (2024). "Lost in the Middle: How Language Models Use Long Contexts." Transactions of the ACL, 12, 157-173. DOI