← Back to Blog

AI SEO EXPERIMENTS

GEO vs SEO: Where 100,411 AI Citations Actually Come From (And Why Most GEO Advice Misses 17%)

By Published
GEO vs SEO: Where 100,411 AI Citations Actually Come From (And Why Most GEO Advice Misses 17%)

Most analysis of AI citation stops at "Google rank predicts citation." That explains about a quarter of what we see. The rest splits into two very different things, and only one of them is noise.

We pulled 100,411 AI citation events from ChatGPT, Claude, Perplexity, and Google AI Mode across 2,000 user queries, then matched every cited page against Google's top-100 search results for the same queries. The full study is at The SEO Floor research page. This post is the practitioner version.

The headline most people will quote is that 75% of AI citations land outside Google's top 30. That number is correct, and on its own it is misleading. The real story is that AI citations split into three populations with three different rules. Pretending they are the same population is how teams end up either over-investing in SEO or chasing GEO levers that do not move anything.

Key numbers: GEO vs SEO

Population Share of citations What drives it
SEO-gate citations ~25% Top-30 Google rank, multi-platform consensus, repeat citations
Repeatable deep-tier citations ~17% Page-level GEO features + domain-level concentration signals
Fuzzy-retrieval noise ~58% One-shot pickups, single platform, no consistent feature pattern
Per-tier odds gradient 34x Top-3 vs rank 31-100 (mixed-effects logistic regression)
Strongest single GEO feature OR 1.31 Schema markup (5-type sum, controlling for rank)
Repeat-cited domain meta-description coverage 97.7% vs 88.7% on low-repeat domains

The gap in current AI citation analysis

The "is GEO just SEO" debate has been running for two years, and most arguments on both sides use the wrong evidence. SEO commentators point to high overlap between AI-cited URLs and top-ranking pages. GEO commentators point to feature studies showing statistics, schema, and structure correlate with citation. Both are looking at the same data and reaching different conclusions because nobody had a comparison-pool design at scale.

A comparison pool is the difference between "what do cited pages look like" and "what do cited pages look like compared to similar uncited pages from the same query." Without the second framing, every feature finding is vulnerable to a Berkson's-paradox critique: cited pages are mostly publisher pages, so any feature common in publisher pages will look like it predicts citation even if it does not.

Study A is the first cited-vs-uncited regression at this scale. We were not trying to settle the binary "is GEO real" question. We covered that in Is GEO Just Repackaged SEO? on a smaller dataset. The new question is harder: when GEO features matter, which 17% of citations are they actually driving?

How we measured GEO vs SEO citation patterns

The design has three pieces.

Step 1: collect the citations. We pulled 100,411 citation events from production AI platforms over 10 weeks. ChatGPT, Perplexity, Claude, and Google AI Mode each contributed citations across 2,000 user queries spanning 14 verticals (e-commerce, SaaS, health, legal, automotive, and so on).

Step 2: build the comparison pool. For every one of the 2,000 queries, we pulled Google's top-100 organic results. That gave us 165,661 unique URLs. Every URL becomes an observation: cited or uncited. This is the comparison group that lets us actually estimate the effect of features rather than just describing cited pages in isolation.

Step 3: assign each URL a Google rank tier.

Tier Google rank Source
Tier 1 1 to 3 SERP join
Tier 2 4 to 10 SERP join
Tier 3 11 to 30 SERP join
Tier 4 31 to 100 (or indexed beyond 100) SERP join + site: lookup
Tier 5 Not indexed by Google, exists on open web site: returns zero, CommonCrawl confirms URL

We then ran a mixed-effects logistic regression with Google rank tier, GEO composite score, vertical fixed effects, and query random intercepts as predictors of citation probability.

2,000 user queries 14 verticals 4 AI platforms ChatGPT, Claude, Perplexity, Google AI Google top-100 SERPs 165,661 unique URLs Citation corpus 100,411 events + comparison pool Mixed-effects logistic regression cited ~ tier + GEO + vertical

The full methodology, including the keyword-reformulation pipeline that maps conversational queries to search-intent versions and the Cohen's kappa validation of that pipeline, is documented in the research paper and pre-registered at OSF.

Path 1: the SEO gate (about 25% of AI citations)

Among (URL, query) observations in the comparison pool, here is how citation odds vary by Google rank tier:

Tier Google rank Odds vs Tier 3 95% CI
Tier 1 1 to 3 7.82x 7.28 to 8.39
Tier 2 4 to 10 2.97x 2.81 to 3.14
Tier 3 11 to 30 1.00x (reference) -
Tier 4 31 to 100 0.23x 0.22 to 0.24

The Tier 1 vs Tier 4 odds ratio is roughly 34x. A page in Google's top 3 for the keyword-intent version of an AI query is about 34 times more likely to be cited than a page ranked 31 to 100 for the same query. This is the SEO gate. It is the single largest signal in the entire dataset.

The pages on this side of the gate behave consistently. Top-3 cited URLs are 33% one-hit wonders. The rest are cited multiple times, often across multiple platforms, often for related queries. They function as stable answer sources, the kind of page AI returns to.

This is also where multi-platform consensus lives. When ChatGPT and Claude and Perplexity all cite the same page for the same query, that page is overwhelmingly in Google's top 30. We covered the consensus pattern in The 1 Out of 50 Rule: AI Citation Consensus. The mechanism behind it is the SEO gate. All four platforms have different retrieval architectures, but they all weight Google rank heavily, and where they agree on a question they tend to agree on Google's top answers.

So if your goal is to be cited consistently, by multiple platforms, for queries you can predict, the strategy is unsurprising: rank in the top 30 of Google's organic results for those queries. Same playbook SEO has always run.

The thing nobody talks about is the other 75%.

The 75% trap, properly unpacked

Three quarters of all AI citations land outside Google's top 30. The instinct is to read this as evidence that SEO does not matter for AI, or that AI is "going deep" and finding hidden gems. Both readings are wrong.

The 75% is real, but it splits into two very different populations, and only one of them is something a publisher can target.

100,411 AI citations split into three populations 25% 17% 58% SEO gate Top-30, repeat-cited Stable sources Repeatable deep-tier No high rank, cited 2+ times Fuzzy retrieval noise One-shot, single platform, no consistent feature pattern

Of all 42,250 unique URLs cited in the deep tier (rank 31+ or indexed but ranked beyond #100):

  • 77% were cited only once. One platform, one query, one event. Never seen again.
  • 92% were cited by only a single platform. Different chatbots picked different long-tail pages with almost no overlap.
  • About 23% were cited 2 or more times. This is the repeatable deep-tier cohort, the population that earns repeat citation despite no high SERP rank.

The 77% one-shot population is what AI does when it cannot find a confident answer in the SERP top 30 and has to fan out into the long tail. There are 43,000+ unique pages in this bucket across 16,000+ domains. There is no consistent page-level feature pattern that distinguishes them. Trying to "earn" a citation in this bucket through page-level optimization is not a meaningful strategy. You would be optimizing for a coin flip.

The 23% that gets cited repeatedly is different. These pages do not rank, but they get picked over and over. That is not noise. That is a signal.

Path 2: the repeatable 17% (where GEO levers live)

If you do the math, the repeatable cohort is about 23% of the 75% deep-tier bucket, which works out to roughly 17% of all AI citations. This is the part of the dataset where distinct GEO levers actually live, and there are two layers to it.

Page-level levers

Compared to the one-shot deep-tier population, repeat-cited deep-tier pages look measurably different on every feature we tracked.

Feature One-hit cited Repeat-cited (5+) Difference
Word count 2,516 3,211 +28%
Statistics density (per 1k words) 9.63 12.30 +28%
Primary-source score -4.72 +0.36 +5.08
Schema presence (5-type sum) 1.03 1.19 +0.15
Has canonical tag 92.6% 95.0% +2.4 pp
Has meta description 90.6% 94.7% +4.1 pp

The pattern is consistent: substantive length, dense factual content, original analysis instead of aggregation, schema markup, and clean technical hygiene. None of these are new GEO claims. What is new is the evidence that they specifically separate the repeatable cohort from the random cohort, even when nobody on either side ranks well in Google.

The "primary-source score" is worth a sentence. It rewards pages that produce original statistics and technical depth, and penalizes pages that mostly aggregate other sources via citations and outbound links. Repeat-cited pages score positive on this composite. One-hit pages score deeply negative. Translation: AI rewards pages that are sources, not pages that point to sources.

Domain-level levers (the multiplier)

Page-level work alone is not enough. The bigger pattern in the data is that repeat-cited pages cluster on a small number of domains.

Of 1,352 publisher domains in our corpus with 5 or more cited URLs, 200 of them (15%) had a citation-per-URL ratio of 3.0 or higher. Those 200 domains contributed 41.7% of all non-UGC citation events from only 21% of URLs. Domain-level concentration is the dominant lever in the deep-tier game.

Top high-repeat domains fall into two archetypes:

  1. Niche specialist agencies and resources. canopymanagement.com (Amazon advertising agency, 21 URLs, 20.86 cit/URL). igppc.com (PPC agency, 16.90). onely.com (SEO agency, 14.39). These are not big publishers. They are tightly focused operators who own one topic vertical end to end.
  2. Editorial-quality niche publishers. helpguide.org and verywellmind.com on mental health. They cover one topic with editorial depth, not 100 topics with thin content.

The site-wide technical signal on these domains is striking:

Feature Low-repeat domains High-repeat domains
Has meta description 88.7% 97.7%
Has canonical 91.0% 96.8%
Schema presence (avg) 1.03 1.21
Word count (avg) 2,788 3,326
Primary-source score -7.28 -1.42

97.7% meta description coverage means it is on basically every page, not just the cited ones. Low-repeat domains hover around 88%. The 9-point gap is the difference between "this site is treated as systematic and trustworthy by indexing systems" and "this site is treated as inconsistent." The same goes for canonical tags. These are domain-level signals. They cannot be fixed on one page.

The takeaway: page features matter, but the dominant lever in the repeatable 17% is whether your domain is the kind of place AI fan-out generation reaches consistently for queries in your topic. Page optimization on a non-retrieved domain produces 1x citations. The same page features on a domain AI consistently retrieves produce 10x to 20x.

Why schema is the lever in both SEO and GEO paths

Across all seven pre-registered GEO features, schema markup is the strongest single content-level predictor of citation, controlling for Google rank.

Feature (per 1 SD increase, alone) Odds ratio
Schema presence 1.31
Primary-source score 1.12
Answer-first coverage 1.09
Comparison signals 1.06
List structure 1.04
Statistics density 1.03
Heading density 0.94

When all seven features are in the model together alongside SEO tier and vertical, schema retains an odds ratio of 1.29. No other feature comes close. Schema markup is what content people can actually move to improve citation odds inside the SEO gate, and it is also what site owners can move to improve citation odds in the repeatable deep-tier cohort.

The honest caveat: schema markup is correlated with site budget, server-side rendering, technical-SEO investment, and domain authority. Our observational design cannot prove that AI parses JSON-LD directly. It is possible AI is preferring well-resourced publishers, and well-resourced publishers happen to deploy schema. The interventional test, where we add schema to otherwise-equivalent pages and measure downstream citation change, is a follow-up study we have queued.

Until that runs, treat schema as the strongest predictive signal among content features, not a proven causal lever. But also: do not skip it. Even under the most skeptical reading, schema is a reliable proxy for the kind of site AI tends to cite.

Where do AI platforms diverge in citation behavior?

The four major AI platforms behave differently in the deep tier. The clearest fingerprint is how they treat user-generated content (Reddit, YouTube, Quora, forums) when they reach beyond Google's top 30.

Platform UGC share of deep-tier citations
Claude 0.6%
ChatGPT 16.3%
Google AI Mode 21.5%
Perplexity 24.3%

Claude essentially refuses to cite UGC in the deep tier. Perplexity leans on it heavily. ChatGPT and Google AI Mode sit in the middle. This is an architecture-level divergence, not a content-feature divergence, and it changes what publishers should be optimizing for on each platform. We covered the per-platform retrieval mechanics in How ChatGPT Researches Your Brand.

One more surprise: about 1% of all citations go to pages Google has dropped from its index entirely (we verified the URLs exist on the open web via CommonCrawl). 84% of these de-indexed citations happened in the last 30 days of our corpus. AI is actively retrieving these pages right now from somewhere outside Google's index, not just dredging them from training data. It is a small share, but it is the cleanest evidence we have that AI retrieval is not 100% Google-mediated.

Bylines: a small free win for AI citation

In an earlier internal study, we found a negative association between author bylines and AI citation. Study A's broader comparison pool reverses that finding. Controlled for rank tier, length, GEO composite, and vertical, the byline effect is OR 1.12 (95% CI 1.07 to 1.17). Per platform: ChatGPT 1.40, Claude 1.31, Perplexity 1.11, Google AI Mode 1.09.

The earlier negative result was a denominator artifact. The prior comparison group was Google's top 10, which is byline-heavy because premium publishers use bylines for E-E-A-T. Study A's broader top-100 comparison gives a more realistic baseline. Keep author bylines on substantive content. The effect is small but robust, free to implement, and largest on ChatGPT and Claude.

What to actually do for GEO vs SEO

Two paths, two playbooks, both worth running in parallel.

For the SEO gate (the top-30, multi-platform-consensus citations):

  • Rank in Google's top 30 for the queries you care about. There is no shortcut around this for the 25% of citations that come from this path.
  • Schema markup on the cited page (Article + FAQ + Organization at minimum, Product/Review where applicable).
  • Author bylines and publish/update dates on every substantive page.

For the repeatable deep-tier (the 17% that is structured rather than noise):

  • Substantive length (2,500 to 3,500 words) with high statistics density (10+ per 1,000 words).
  • Original analysis or technical depth (positive primary-source signal), not aggregation of other sources.
  • Site-wide technical hygiene at 95% or higher: canonical tags, meta descriptions, schema, on every page (not just cited pages).
  • Niche specialization. Pick one topic vertical you can credibly own. 15 to 50 substantive pages, not 500 thin ones across every topic.
  • Cross-platform indexation. Submit sitemaps to Bing (for ChatGPT), Google (for Google AI Mode), and verify Claude and Perplexity crawlers are reaching you in your server logs.

For the playbook on becoming a repeat-cited domain specifically, see How to Get Cited Consistently in AI Answers. The short version: domain-level signals build over 6 to 12 months, page-level work compounds inside that.

The 58% of citations that are fuzzy-retrieval noise is not something to optimize for. There is no consistent pattern at the page level. Targeting it directly would mean optimizing for randomness.

Read the full study

The full paper, including methodology, sensitivity tests, the per-platform breakdowns, the per-vertical heterogeneity analysis, and the pre-registration log, is at The SEO Floor: Measuring Google Rank Distribution of AI-Cited Pages. Pre-registered at OSF, code and data on Zenodo.

Two ways to put this on your own site. For a fast self-serve check of how your pages score against the citation predictors, run our AI Visibility Quick Check. It scans your URL and flags the page-level signals that separate the repeatable cohort from noise.

For a personal walkthrough, request a Free AI Visibility Video Audit. We pull your actual citation data and the comparison pool for your top queries, and walk you through where your domain sits across the three populations and which fixes have the highest leverage for your specific site.

The "is GEO just SEO" debate is over. SEO is the gate for a quarter of citations. A distinct set of GEO levers, mostly domain-level, drives another 17%. The rest is noise. Plan accordingly.