I Rank on Page 1 -- What Gets Me Cited by AI?
Position-Controlled Analysis of Page-Level and Domain-Level Predictors of AI Search Citation
Anthony Lee — AI+Automation
Preprint v1 — April 2026 | Not yet peer-reviewed
Key Findings
Content Matches Domain Within the SERP
When comparing equally-ranked pages, content features (AUC = 0.673) provide comparable predictive power to domain identity (AUC = 0.687). This contrasts sharply with uncontrolled comparisons where domain dominates at AUC = 0.975. Among pages that already rank, what the page says matters as much as what domain it is on.
Comparison Structure Is the Top Actionable Signal
Pages with "vs" structures, comparison tables, and side-by-side analyses are significantly more likely to be cited (d = 0.43, p < 0.001 in all four position bands). This effect holds across all five intent types, not just comparison queries.
Blog Tone Is the Strongest Negative Predictor
First-person density (8.1% of model importance) is the second most important actionable feature. Pages written in blog/opinion voice ("I think", "in my experience") are significantly less likely to be cited. AI platforms prefer authoritative reference content over personal narrative.
3+ Platform Consensus Pages Have 7x Statistics Density
Pages cited by ChatGPT, Perplexity, and Google AI Mode independently have 15.3 stats per 1k words vs 2.2 for uncited pages, 2x the word count, and positive primary source scores. Cross-platform consensus is the clearest quality signal in the dataset.
SERP Co-Occurrence Is the Strongest Domain Trust Signal
Domains ranking for 4+ queries in Google's top 20 have 87%+ AI citation rates (rho = 0.341, p = 2.6 x 10-70). High-presence domains receive 2.04 citations per SERP slot vs 0.665 for single-appearance domains, confirming this reflects genuine trust, not mere exposure.
Page Speed Confirmed Irrelevant
Load time is not significant in any position band (p > 0.39 in all four). This definitively confirms Experiment K's finding that speed was a domain-level proxy, not a page-level signal. Content structure provides the largest marginal lift beyond rank position (+0.021 AUC).
Read Full Abstract
Generative Engine Optimization (GEO) aims to improve content visibility in AI-generated search responses. Prior observational studies have failed to isolate page-level signals because domain identity alone predicts AI citation at AUC = 0.975, confounding every between-domain comparison. We introduce a position-band matching design that controls for Google ranking position, asking: among equally-ranked pages, what page-level features predict AI citation? Using 250 queries across a balanced grid (5 intent types, 10 verticals), we collected citations from ChatGPT, Perplexity, and Google AI Mode and crawled 10,293 unique pages with 66 structural, semantic, and content-quality features. Within position bands, content features and domain identity provide comparable predictive power (content AUC = 0.673, domain AUC = 0.687 with enriched representations, combined AUC = 0.697), a convergence that contrasts sharply with the domain dominance observed without position control (AUC = 0.975). The top actionable predictors are comparison structure (d = 0.43, significant across all five intent types), query-term coverage (d = 0.42), subheading depth, statistical data density, and the absence of first-person/blog tone. Content structure provides the largest marginal lift beyond rank position (+0.021 AUC). In a second contribution, five domain-level tests reveal that SERP co-occurrence (topical breadth) is the strongest domain trust predictor (rho = 0.341, p = 2.6 x 10-70), that cited domains are less lexically unique than their SERP competitors, and that a combined domain model achieves AUC = 0.921, with SERP presence accounting for 63% of importance. Data and code are publicly available.
Position Gradient
| Position Band | Pages | Cited | Citation Rate |
|---|---|---|---|
| 1-3 | 339 | 194 | 57.2% |
| 4-7 | 625 | 264 | 42.2% |
| 8-12 | 939 | 258 | 27.5% |
| 13-20 | 1,650 | 223 | 13.5% |
Position 1: 68.8% citation rate. Position 20: 11.9%. Even at position 1, 31% of pages are not cited.
Top Actionable Features
| Feature | Importance | Direction |
|---|---|---|
| First-person density | 8.1% | Less blog tone |
| Word count | 7.1% | Longer (~2,000) |
| Comparison signals | 6.4% | "vs", tables |
| H3 subheadings | 5.9% | More (8-10+) |
| Primary source score | 4.1% | Produce data |
| Query term position | 3.8% | Answer early |
| Internal link count | 3.7% | More links |
| Query term coverage | 3.7% | Cover terms |
| Statistics per 1k words | 2.5% | More data |
Full model AUC = 0.753 (10 downsamples, 5-fold CV). Actionable features = 69.2% of total importance.
Keywords
Generative Engine Optimization, GEO, Information Retrieval, AI Search Engines, Large Language Models, Citation Prediction, Search Engine Results Page, position-band matching
Citation
Lee, A. (2026). I rank on page 1 -- what gets me cited by AI? Position-controlled analysis of page-level and domain-level predictors of AI search citation. Preprint v1, AI+Automation.