The SEO Floor: Measuring Google Rank Distribution of AI-Cited Pages

Name: The SEO Floor: Replication Dataset
Creator: Anthony Lee
Published: 2026-04-25
License: https://creativecommons.org/licenses/by/4.0/

Lee, Anthony

doi:10.5281/zenodo.19787654

The SEO Floor

Measuring Google Rank Distribution of AI-Cited Pages

Author

Anthony Lee — AI+Automation

Status

Preprint — Published April 25, 2026 | Pre-registered (OSF) | Not yet peer-reviewed

DOI

10.5281/zenodo.19787654 (paper) · 10.5281/zenodo.19787328 (data)

Pre-registration

OSF DOI 10.17605/OSF.IO/FMSRD (filed 2026-04-20)

Download PDF Archived on Zenodo Replication dataset

Key Findings

SEO is the gate. Top-3 pages are ~34x more likely to be cited.

In a mixed-effects logistic regression on 114,034 (URL, query) observations with query random intercepts, a Google top-3 page is 7.82x more likely to be AI-cited than a rank 11-30 page (OR vs Tier 3, 95% CI 7.28-8.39). A rank 31-100 page is 4x less likely (OR=0.23). Tier 1 vs Tier 4 odds ratio: ~34.

Schema markup is the strongest content-feature predictor (OR=1.31).

Per-feature regression of 7 pre-registered GEO levers (each per 1 SD): schema_presence OR=1.31, primary_source_score OR=1.12, answer_first_coverage OR=1.09, comparison_signals OR=1.06, list_structure OR=1.04, stats_density OR=1.03, heading_density OR=0.94. Schema retains OR=1.29 in the multivariate model controlling for the other six features and SEO tier.

75% of citation events go to pages outside Google's top 30, but this is a denominator artifact.

The pool of pages beyond rank 30 is vastly larger than the top-30 pool, so even at 1/10 the per-page citation probability, deep-tier pages contribute the bulk of absolute citation events. Reporting the 75% figure without the per-page odds context is interpretively misleading. Both facts are true: SEO is the dominant per-page lever AND most citation events land deep-tier.

1% of citations go to pages Google has de-indexed entirely (Tier 5).

Of 100,411 events, 942 (~1%) target URLs that site:URL returns zero results for, but that CommonCrawl confirms exist on the open web. 83.7% (95% CI 80.9-86.1%) of those Tier 5 events occurred within the last 30 days of the corpus. AI retrieval is not 100% Google-mediated.

Platforms diverge sharply on UGC tolerance in deep tier.

Share of UGC/social citations within each platform's Tier 4+ deep-tier citations: Claude 0.6%, ChatGPT 16.3%, Google AI Mode 21.5%, Perplexity 24.3%. Claude effectively excludes UGC; Perplexity leans on it heavily. Architecture-level difference, not just a quirk in the aggregate.

Author bylines have a small positive effect (OR=1.12).

Controlled regression on the non-UGC subset: has_author_attribution OR=1.122 (95% CI 1.074-1.173, p<0.0001). Per-platform: ChatGPT 1.40, Claude 1.31, Perplexity 1.11, Google AI Mode 1.09. A prior internal finding (OR=0.632 negative) was a comparison-pool artifact; full reconciliation in Appendix A of the paper.

Abstract

Whether Generative Engine Optimization (GEO) is a discipline distinct from Search Engine Optimization (SEO), or merely SEO repackaged, has been debated since the rise of consumer-facing AI chat platforms. Empirical resolution requires measuring two things: where on Google's search results AI-cited pages actually rank, and whether content features pre-registered as "GEO levers" predict citation independently of Google rank.

We collected 100,411 AI citation events from four production AI platforms (ChatGPT, Perplexity, Claude, Google AI Mode) across 2,000 user queries, and assembled a comparison pool of 165,661 unique URLs from Google top-100 SERPs for the same queries. Combining the citation corpus with the comparison pool yields 114,729 (URL, query) observations for a mixed-effects logistic regression of citation probability on Google rank tier and GEO composite score, with query random intercepts.

Headline finding 1: While 75.4% of citation events are aggregated to pages outside Google's top 30, per-page citation odds span a 34x range across rank tiers — a top-3 page is 7.82x more likely to be cited (OR vs Tier 3, 95% CI 7.28-8.39) while a rank 31-100 page is 4x less likely (OR=0.23, 95% CI 0.22-0.24). The aggregate 75% deep-tier figure is a denominator artifact: the internet beyond rank 30 has vastly more pages than the top 30, so even at low per-page citation probability the absolute count of deep-tier citations dominates. The two figures must be reported together.

Headline finding 2: A pre-registered GEO composite of seven content features adds small but real predictive power above Google rank (Z-sum OR=1.06; PCA-1 OR=1.15 per 1 SD), driven primarily by schema markup (OR=1.31). Statistics density and list structure show small positive effects after methodological correction; heading density shows a small negative effect.

Headline finding 3: The 75% Tier-4+ aggregate is overwhelmingly composed of URLs Google ranks beyond #100 (90% of "Tier 4" events) — not the 31-100 band that the H2 regression speaks to — and these deep-tier citations are 77% one-hit-wonders cited by a single AI platform, with sharp platform divergence on user-generated-content tolerance (Claude 0.6% UGC in deep tier; Perplexity 24%). The Lily-Ray-aligned framing ("AI citation is gated by Google ranking") is empirically supported, with schema markup emerging as the strongest single content-feature predictor inside the gate.

Method

Citation corpus

100,411 AI citation events from VPS-captured production scrape sessions and UI-scraped citation data across four target platforms. Temporal scope: 2026-02-04 through 2026-04-17. Study 3 (OpenAI Responses API fan-out) and Gemini-API grounding-redirect data were excluded a priori per the pre-registration log as API-only artifacts. 94,384 events have usable Google-rank tier assignments after the matching pipeline.

Query reformulation (Method A)

User queries are conversational and situation-first (median 18 words, often containing first-person pronouns and personal context). Searching Google with such queries literally measures Google's weak conversational handling rather than how cited pages rank for the underlying intent. We reformulate every user query with Gemini 3.1 Flash Lite into a keyword-intent search version, with deterministic temperature=0 for reproducibility. The original user query is retained on every record.

Comparison pool

For each of the 2,000 keyword-reformulated queries we pulled Google's top-100 SERPs via Thordata's SERP API, producing 213,814 SERP rows. Every URL in the Google top-100 becomes an observation in the comparison pool regardless of whether it was cited. URLs cited by AI but not present in our top-100 SERP pull are appended to the pool with their tier assigned by site:URL Thordata lookups (~1,000 lookups) and CommonCrawl verification.

SEO tier classification

Tier	Definition
Tier 1	Google rank 1-3 for the keyword-reformulated query
Tier 2	Google rank 4-10
Tier 3	Google rank 11-30 (regression reference)
Tier 4	Google rank 31-100 (in our SERP pull) or rank-unknown but indexed
Tier 5	Google does not index the URL (`site:URL` returns zero), but CommonCrawl confirms it exists on the open web

GEO feature extraction

Seven pre-registered content features extracted via Playwright crawl using extractors imported verbatim from the prior research codebase: schema_presence (0-5 sum of has-Article / has-FAQ / has-Product / has-Review / has-Organization JSON-LD), primary_source_score (Princeton-derived composite of stats + technical content minus aggregator signals), answer_first_coverage, comparison_signals, list_structure, stats_density, and heading_density (H2+H3 per 1,000 words). An initial run reimplemented several extractors via ad-hoc regex; the corrected analysis (using the imported Princeton extractors per the protocol) reconciled coefficient signs with prior internal Experiment M findings. The full flawed-vs-corrected reconciliation is documented in Addendum 9 of the pre-registration log.

H2 specification

Mixed-effects logistic regression on n=114,034 observations after Tier-5 exclusion (no bounded uncited denominator):

cited ~ seo_tier + geo_zsum + vertical + (1 | qid)

Estimated with R 4.5.3, lme4, broom.mixed, optimx. Vertical fixed effects: 13 dummies for the 14 verticals where N≥150 observations. PCA-1 specification reported as a sensitivity that re-weights composite axis toward the components carrying the predictive signal.

Results: H2 Primary Specification

Term	OR (95% CI)	p
Tier 1 (1-3) vs Tier 3	7.82 (7.28-8.39)	<0.0001
Tier 2 (4-10) vs Tier 3	2.97 (2.81-3.14)	<0.0001
Tier 4 (31-100) vs Tier 3	0.23 (0.22-0.24)	<0.0001
GEO Z-sum (per 1 SD)	1.06 (1.05-1.07)	<0.0001

Tier 1 vs Tier 4 odds ratio = ~34. The GEO Z-sum effect is statistically robust (p<0.0001, 95% CI excludes 1.0) but small in absolute magnitude (6% odds increase per 1 SD). The PCA-1 specification produces a larger composite effect (OR=1.15 per 1 SD) because its first-component axis re-weights the composite toward the components carrying the per-feature signal.

Per-feature regressions

Each feature estimated alone in a model with SEO tier and vertical, to identify which composite components carry the predictive signal:

Feature (per 1 SD, alone)	OR (95% CI)	p
schema_presence	1.31 (1.28-1.33)	<0.0001
primary_source_score	1.12 (1.08-1.16)	<0.0001
answer_first_coverage	1.09 (1.07-1.11)	<0.0001
comparison_signals	1.06 (1.04-1.08)	<0.0001
list_structure	1.04 (1.02-1.06)	0.0003
stats_density	1.03 (1.01-1.05)	0.010
heading_density	0.94 (0.90-0.98)	0.002

Six of seven features show positive associations with citation; schema markup is by an order of magnitude the strongest. Heading density shows a small negative association — we read this as high heading-density-per-word likely flagging thin / over-chunked / SEO-spam content, since cited pages are also longer (Section 5.2.4 of the paper). When all seven features are included simultaneously alongside SEO tier and vertical, schema_presence retains OR=1.29 and stats_density's marginal effect attenuates to non-significance.

Per-platform consistency

All four platforms show the same dominant SEO-tier pattern. Tier 1 OR ranges from 4.94 (Claude, reduced-n) to 8.61 (Perplexity, steepest gradient). GEO Z-sum effects vary from 1.04 (Google AI Mode) to 1.11 (Claude). No platform exhibits the pre-registered "platform-specific floors" pattern.

Discussion

The "GEO is just SEO" debate, settled

The Lily-Ray-aligned position — "indexation is a gate, ranking is the differentiator" — is empirically supported. Rank dominates. The "GEO is its own discipline" position requires more nuance: there is one content feature (schema markup) with an independent effect comparable to a within-tier rank improvement, four to five other content features with small but consistent independent effects (3-12% odds increases per 1 SD), and no evidence for the strong-form GEO claim that content optimization can substantially compensate for poor SEO rank — the GEO Z-sum coefficient (+0.06 per 1 SD) is two orders of magnitude smaller than the Tier 1 vs Tier 4 SEO effect (+3.5 in log-odds).

The honest synthesis is therefore: SEO is the gate. Schema markup is the strongest content-feature predictor inside the gate. Other GEO features add marginal value. Most of the rest is noise.

The schema causal-interpretation problem

Schema markup's OR=1.31 must be read as a predictive association, not a causal effect. Schema presence in a publisher's HTML is correlated with site budget, server-side rendering quality, fast time-to-interactive, high backlink-derived domain authority, and a dedicated technical-SEO function — every one of which independently predicts AI citation in ways our observational design cannot disentangle. From the H2 regression alone we cannot determine whether (a) AI retrieval pipelines consume JSON-LD as structured data and weight pages with rich schema higher, or (b) AI is merely preferring well-resourced publishers, who happen to be the population that systematically deploys schema markup.

Disambiguating these hypotheses requires an interventional design — adding schema to otherwise-equivalent pages and measuring downstream citation change — which is the target of a planned follow-up Study B (greenfield factorial on uncrawled domains). Until that experiment runs, the responsible reading of our schema finding is "schema markup is the strongest predictive signal among the seven pre-registered GEO features," not "schema markup causes citation."

Why the deep-tier sprawl is not a strategy

The 75.4% Tier-4+ aggregate is not evidence that AI is "going deep" into a curated set of authoritative low-rank sources. The deep-tier citations are dominated by: URLs Google ranks beyond #100 (90% of "Tier 4" events); one-hit-wonders cited by a single platform for a single query (77% of deep-tier cited URLs); a long tail of 43,000+ unique URLs across 16,000+ domains; and heavy UGC reliance with sharp platform divergence (Claude 0.6% UGC vs Perplexity 24% UGC in deep tier). Deep-tier citation is not a meaningfully gameable target. The actionable target for publishers seeking AI citation at scale is the top-30 SERP, where per-page citation probabilities are an order of magnitude higher and citations tend to repeat across platforms.

Limitations

Observational, not causal. Although the comparison-pool design eliminates collider bias on the citation outcome, unobserved publisher-side confounders (domain authority, backlink profile, site age, editorial investment) remain.
Method A model dependence. The keyword-reformulation pipeline depends on Gemini 3.1 Flash Lite's mapping from conversational to search intent. Cohen's κ between Method A and Method B varies by platform (0.80 Google AI Mode, 0.42 ChatGPT).
Temporal scope. The corpus spans ~10 weeks (February through mid-April 2026). Findings may not generalize to longer-horizon AI behavior; H3's classical persistence metric is unestimable on this window.
Claude reduced-n. Claude contributes ~5,000 analytic citation events vs ~25-30k for the other three platforms, due to a known web_search cache constraint. Claude-specific findings are reported with expanded CIs.
Google-centric SEO proxy. SEO tier is measured against Google rank. Bing parallel data is auxiliary; ChatGPT's Bing-derived retrieval pipeline is not separately validated.
Pre-registration deviations. Nine addenda are documented at osf.io/w76y8, all disclosed before publication.

What this means if you optimize websites

Earn the rank first. No page-level GEO investment substitutes for Google rank. Ranking 1-3 produces ~34x the citation odds of ranking 31-100 for the same query. The page-level levers operate inside the gate, not as a workaround.
Deploy schema markup, multiple types per page. The strongest content-feature predictor by an order of magnitude. Article + FAQ + Product/Review + Organization where each applies. Site-wide, not just on the cited page.
Keep author bylines on substantive content. Small but robust positive effect (OR=1.12 overall, 1.40 on ChatGPT). Use Schema.org Person markup with sameAs (e.g., ORCID) where you have it.
Don't stuff headings into thin content. Heading density per 1k words is negatively associated with citation. Long substantive pages with proportional structure beat short pages with packed subheadings.
Don't try to "earn" deep-tier citations. 77% of deep-tier cited URLs are one-hit-wonders. There is no consistent feature pattern to retro-engineer. The actionable target is the top-30 SERP where citations recur and accumulate trust.
Track per-platform citation, not just aggregate. Claude excludes UGC (0.6% deep-tier). Perplexity leans heavily on it (24%). Aggregate "AI visibility" metrics conceal architecture-level platform divergence.

Reproducibility

Pre-registration: OSF DOI 10.17605/OSF.IO/FMSRD | Project page: osf.io/w76y8
Pre-registration log: nine addenda public on the OSF project wiki.
Code: tagged at git commit study-a-v1.0.2. All study_a_* Python and R scripts in the reproducibility package.
Data package: Zenodo DOI 10.5281/zenodo.19787328. Includes the locked 2,000-query analysis set, citation corpus (analytic version), comparison pool, Playwright crawl features, H2 analytic dataset, regression coefficient tables (corrected), and the flawed-run archive for audit.
Replication: any reader can re-run the H2 regression by checking out the tagged commit, downloading the Zenodo data package, installing R 4.5.3 + lme4 + jsonlite + dplyr + broom.mixed + optimx, and executing scripts/run_h2_all.ps1. Total wall time ~7 hours on commodity hardware.

Cite this paper

Lee, A. (2026). The SEO Floor: Measuring Google Rank
Distribution of AI-Cited Pages. Zenodo.
https://doi.org/10.5281/zenodo.19787654