Query Intent, Not Google Rank: What Best Predicts AI Citation Behavior

Lee, Anthony

doi:10.5281/zenodo.18653093

← All Research

Query Intent, Not Google Rank: What Best Predicts AI Citation Behavior

Author

Anthony Lee — AI+Automation

Status

Preprint v3 — March 2026 | Not yet peer-reviewed

Download PDF

Abstract

The rapid integration of AI chatbots into consumer search behavior has spawned a cottage industry of Generative Engine Optimization (GEO) advice, much of it built on untested assumptions about how AI platforms select sources for citation. Industry practitioners widely assert that Google ranking determines AI visibility, that community-consensus platforms like Reddit confer citation advantages, and that AI recommendations are too inconsistent to warrant optimization efforts. We tested these claims empirically across four major AI platforms—ChatGPT, Claude, Perplexity, and Gemini—using a multi-study design that combined large-scale query intent classification (n = 19,556 queries across 8 verticals), Google rank cross-referencing (120 queries via API, plus 100 queries via web UI against both Google and Bing Top-3 results), server-side fetch verification via Vercel middleware logging, and page-level technical analysis of 479 cited and non-cited pages. Our results challenge all three prevailing claims. First, query intent—not Google rank or domain authority—emerged as the strongest predictor of citation source type at the aggregate level, with intent distributions varying significantly by vertical (χ²(28) = 5,195, p < .001, Cramér’s V = 0.258), though formal predictive modeling showed that at the individual page level, technical page features (AUC = 0.594) outperformed intent (AUC = 0.462) for predicting citation status. Second, Google’s Top-3 organic results predicted AI citations poorly at the URL level: ChatGPT matched only 7.8% of URLs via API, and a web UI replication (Study 2b) confirmed this pattern (6.8%). However, domain-level analysis revealed substantially higher alignment (28.7–49.6% across four platforms), indicating that AI platforms draw from Google’s top-ranked domains while selecting different specific pages. A dual search engine comparison showed that all platforms aligned 4–7× more strongly with Google than with Bing organic results, even platforms reportedly using Bing as their search backend. Reddit—despite occupying 38.3% of Google Top-3 positions in our API sample—received exactly zero AI citations via API (binomial p = 3.43 × 10⁻²³ for Perplexity), though web UI responses from the same platforms cited Reddit at rates of 8.9–15.6% of total citations. Third, AI brand recommendations showed substantial within-platform consistency (ChatGPT mean Jaccard = 0.619, 95% CI [0.537, 0.701]), though cross-platform agreement was near-random (all-four-platform Jaccard = 0.036). We further discovered a previously unreported architectural divide: ChatGPT and Claude perform live page fetches during conversations, while Perplexity and Gemini rely exclusively on pre-built search indices—with divergent robots.txt compliance behavior between the fetching platforms. These findings suggest that effective GEO strategy requires intent-aware, platform-specific optimization rather than the one-size-fits-all approach currently advocated by industry practitioners.

Keywords

Generative Engine Optimization, GEO, AI citation behavior, AI search, ChatGPT, Claude, Perplexity, Gemini, query intent, brand recommendations, live page fetch, robots.txt compliance

Citation

Lee, A. (2026). Query intent, not Google rank: What best predicts AI citation behavior. Preprint v3, AI+Automation.

DOI: https://doi.org/10.5281/zenodo.18653093

ORCID: 0009-0002-4815-6373 aiXiv: aixiv.260215.000002