Query Intent and Google Rank as Joint Predictors of AI Citation: A Multi-Platform Observational Study
Anthony Lee — AI+Automation
Preprint v6 — April 2026 (revised: rank-as-predictor analyses added) | Not yet peer-reviewed
Key Findings
Google Rank Strongly Predicts Page-Level AI Citation
A logistic regression with log(Google position) alone achieves cross-validated AUC = 0.802. Citation rates drop monotonically with rank: 54% at position 1, ~2% at position 100, with a 13–22× ratio between Top-3 and positions 31–100 across every platform (replicated on 94,599 citation events in the SEO Floor study). The literal-query Top-3 URL match remains low (7.8%–29.7%), but rank is highly predictive once analyzed at the per-page level rather than as exact-URL overlap.
Query Intent Shapes Source-Type Distributions
19,556 queries across 8 verticals. Intent distributions differ significantly by industry (χ² = 5,195, p < .001, Cramér's V = 0.258). Informational queries surface Wikipedia, .gov, and tutorials; comparison queries surface publisher reviews; review-seeking queries surface YouTube and Reddit. Match content type to query intent first.
Domain-Level Alignment Is Stronger Than URL-Level Overlap
URL-level overlap with Google Top-3 is weak (7.8%–29.7%), but domain-level alignment is substantial (28.7%–49.6% across the four platforms). AI platforms draw from Google's top-ranked domains while frequently selecting different specific pages within those domains. All four platforms align 4–7× more strongly with Google than with Bing organic results.
AI Platforms Have a 2-vs-2 Architectural Split
ChatGPT and Claude perform live page fetches during conversations. Perplexity and Gemini rely exclusively on pre-built search indices. ChatGPT and Claude also diverge on robots.txt compliance behavior. Each architecture requires a different optimization strategy.
The Reddit Paradox: API vs Web UI Divergence
Reddit occupied 38.3% of Google Top-3 positions in our API sample but received zero AI citations via API (binomial p = 3.43 × 10⁻²³ for Perplexity). Web UI responses from the same platforms cited Reddit at 8.9%–15.6%. Reddit's influence on AI flows differently through API and consumer-facing channels.
Read Full Abstract
The rapid integration of AI chatbots into consumer search behavior has spawned a cottage industry of Generative Engine Optimization (GEO) advice, much of it built on untested assumptions about how AI platforms select sources for citation. Industry practitioners widely assert that Google ranking determines AI visibility, that community-consensus platforms like Reddit confer citation advantages, and that AI recommendations are too inconsistent to warrant optimization efforts. We tested these claims empirically across four major AI platforms - ChatGPT, Claude, Perplexity, and Gemini - using a multi-study design that combined large-scale query intent classification (n = 19,556 queries across 8 verticals), Google rank cross-referencing (120 queries via API, plus 100 queries via web UI against both Google and Bing Top-3 results), server-side fetch verification via Vercel middleware logging, and page-level technical analysis of 479 cited and non-cited pages. Our results refine all three prevailing claims. Query intent shaped citation source-type distributions across verticals, but at the individual-page level Google rank dominated all other predictors: a logistic regression with log(Google position) alone achieved cross-validated AUC = 0.802 and McFadden R² = 0.203, far exceeding intent-only (AUC = 0.462) and page-feature-only (AUC = 0.594) baselines. A larger replication on 94,599 citation events (Lee, 2026c — the SEO Floor study) found a steep monotonic position gradient: URLs at Google position 1 were cited 54% of the time, dropping to ~2% at position 100, with every platform showing a 13–22× citation-rate ratio between Top-3 and positions 31–100. Reddit received zero AI citations via API despite occupying 38.3% of Google Top-3 positions, though web UI responses cited Reddit at 8.9–15.6%. We further document an architectural divide between live-fetching platforms (ChatGPT, Claude) and index-only platforms (Perplexity, Gemini). These findings suggest that effective GEO strategy requires intent-aware, platform-specific optimization grounded in the real per-page rank gradient, rather than the one-size-fits-all approach currently advocated by industry practitioners.
Keywords
Generative Engine Optimization, GEO, AI citation behavior, AI search, ChatGPT, Claude, Perplexity, Gemini, query intent, Google rank, position curve, brand recommendations, live page fetch, robots.txt compliance
Citation
Lee, A. (2026). Query intent and Google rank as joint predictors of AI citation: A multi-platform observational study. Preprint v6, AI+Automation.