I Rank on Page 1 -- What Gets Me Cited by AI? Position-Controlled Analysis of Page-Level and Domain-Level Predictors of AI Search Citation

Lee, Anthony

doi:10.5281/zenodo.19398158

I Rank on Page 1 -- What Gets Me Cited by AI?

Position-Controlled Analysis of Page-Level and Domain-Level Predictors of AI Search Citation

Author

Anthony Lee — AI+Automation

Status

Preprint v1 — April 2026 | Not yet peer-reviewed

Download PDF

Key Findings

Content Matches Domain Within the SERP

When comparing equally-ranked pages, content features (AUC = 0.673) provide comparable predictive power to domain identity (AUC = 0.687). This contrasts sharply with uncontrolled comparisons where domain dominates at AUC = 0.975. Among pages that already rank, what the page says matters as much as what domain it is on.

Comparison Structure Is the Top Actionable Signal

Pages with "vs" structures, comparison tables, and side-by-side analyses are significantly more likely to be cited (d = 0.43, p < 0.001 in all four position bands). This effect holds across all five intent types, not just comparison queries.

Blog Tone Is the Strongest Negative Predictor

First-person density (8.1% of model importance) is the second most important actionable feature. Pages written in blog/opinion voice ("I think", "in my experience") are significantly less likely to be cited. AI platforms prefer authoritative reference content over personal narrative.

3+ Platform Consensus Pages Have 7x Statistics Density

Pages cited by ChatGPT, Perplexity, and Google AI Mode independently have 15.3 stats per 1k words vs 2.2 for uncited pages, 2x the word count, and positive primary source scores. Cross-platform consensus is the clearest quality signal in the dataset.

SERP Co-Occurrence Is the Strongest Domain Trust Signal

Domains ranking for 4+ queries in Google's top 20 have 87%+ AI citation rates (rho = 0.341, p = 2.6 x 10^-70). High-presence domains receive 2.04 citations per SERP slot vs 0.665 for single-appearance domains, confirming this reflects genuine trust, not mere exposure.

Page Speed Confirmed Irrelevant

Load time is not significant in any position band (p > 0.39 in all four). This definitively confirms Experiment K's finding that speed was a domain-level proxy, not a page-level signal. Content structure provides the largest marginal lift beyond rank position (+0.021 AUC).

Read Full Abstract

Generative Engine Optimization (GEO) aims to improve content visibility in AI-generated search responses. Prior observational studies have failed to isolate page-level signals because domain identity alone predicts AI citation at AUC = 0.975, confounding every between-domain comparison. We introduce a position-band matching design that controls for Google ranking position, asking: among equally-ranked pages, what page-level features predict AI citation? Using 250 queries across a balanced grid (5 intent types, 10 verticals), we collected citations from ChatGPT, Perplexity, and Google AI Mode and crawled 10,293 unique pages with 66 structural, semantic, and content-quality features. Within position bands, content features and domain identity provide comparable predictive power (content AUC = 0.673, domain AUC = 0.687 with enriched representations, combined AUC = 0.697), a convergence that contrasts sharply with the domain dominance observed without position control (AUC = 0.975). The top actionable predictors are comparison structure (d = 0.43, significant across all five intent types), query-term coverage (d = 0.42), subheading depth, statistical data density, and the absence of first-person/blog tone. Content structure provides the largest marginal lift beyond rank position (+0.021 AUC). In a second contribution, five domain-level tests reveal that SERP co-occurrence (topical breadth) is the strongest domain trust predictor (rho = 0.341, p = 2.6 x 10^-70), that cited domains are less lexically unique than their SERP competitors, and that a combined domain model achieves AUC = 0.921, with SERP presence accounting for 63% of importance. Data and code are publicly available.

Position Gradient

Position Band	Pages	Cited	Citation Rate
1-3	339	194	57.2%
4-7	625	264	42.2%
8-12	939	258	27.5%
13-20	1,650	223	13.5%

Position 1: 68.8% citation rate. Position 20: 11.9%. Even at position 1, 31% of pages are not cited.

Top Actionable Features

Feature	Importance	Direction
First-person density	8.1%	Less blog tone
Word count	7.1%	Longer (~2,000)
Comparison signals	6.4%	"vs", tables
H3 subheadings	5.9%	More (8-10+)
Primary source score	4.1%	Produce data
Query term position	3.8%	Answer early
Internal link count	3.7%	More links
Query term coverage	3.7%	Cover terms
Statistics per 1k words	2.5%	More data

Full model AUC = 0.753 (10 downsamples, 5-fold CV). Actionable features = 69.2% of total importance.

Keywords

Generative Engine Optimization, GEO, Information Retrieval, AI Search Engines, Large Language Models, Citation Prediction, Search Engine Results Page, position-band matching

Citation

Lee, A. (2026). I rank on page 1 -- what gets me cited by AI? Position-controlled analysis of page-level and domain-level predictors of AI search citation. Preprint v1, AI+Automation.

DOI: https://doi.org/10.5281/zenodo.19398158

ORCID: 0009-0002-4815-6373 aiXiv: aixiv.260403.000002 Zenodo: Replication Data (54.9 MB)