We asked 250 questions to three AI search engines. They cited 10,138 sources. We visited all candidate pages and measured 66 things about each one. Then we figured out what made the winners different from the losers.
When you ask ChatGPT a question and it shows you a list of sources at the bottom of its answer, have you ever wondered: why those websites? Why not the thousands of other pages about the same topic?
We spent months trying to answer that question. We ran five experiments. Three of them failed. Each failure taught us something important about what we were measuring wrong. The fourth and fifth experiments finally cracked it.
This post explains the entire journey, what we found, what the statistics mean in plain English, and what it means for anyone who wants their website to show up in AI answers.
๐งช THE QUESTION WE WERE TRYING TO ANSWER
Here is the question in simple terms:
When AI picks websites to cite in its answer, is it picking them because of the WEBSITE (like, "oh, that's Forbes, I trust Forbes") or because of the PAGE (like, "this specific page has really good information")?
This matters a lot. If AI only cares about the website name, then the only way to get cited is to BE a famous website. There is nothing a small business or new website can do.
But if AI cares about what is ON the page, then anyone can improve their chances by making better pages.
We wanted to find out which one it was. The answer turned out to be: both, but not in the way anyone expected.
๐ฅ EXPERIMENT 1: THE SPEED TRAP (What We Got Wrong First)
Our first big experiment looked at 4,350 web pages. We built a computer model that tried to predict which pages AI would cite based on 66 different measurements.
The model worked really well. It could predict AI citations with 97.5% accuracy (we measure this with something called AUC, which goes from 50% for random guessing to 100% for perfect prediction). That sounds amazing, right?
Here is what the model said mattered most: page speed. How fast your website loads. It accounted for 65.8% of the model's decisions. The advice seemed clear: make your website faster and AI will cite you more.
But we had a nagging feeling something was wrong.
So we ran a test. Instead of comparing pages from DIFFERENT websites, we compared pages from the SAME website. Like, two different articles on the New York Times. One got cited by AI, one did not. Were the cited ones faster?
No. They were actually SLOWER.
Within the same website, the pages AI cited loaded slower than the ones it skipped. The speed finding was completely backwards when you looked within a single site.
What was actually happening
The model was not really learning "fast pages get cited." It was learning "pages from certain websites get cited, and those websites happen to be fast." Speed was just a way for the computer to figure out which WEBSITE a page came from. Fast loading? Probably a big publisher with good servers. Slow loading? Probably a small blog.
The model was sneaking in website identity through the back door, using speed as a disguise.
What this taught us: You cannot just compare pages from different websites and claim the differences are about the pages. The differences might just be about the websites. This is called a confound, and it ruined our first experiment.
| What we thought | What was actually true |
|---|---|
| Fast pages get cited more | Fast WEBSITES get cited more (and speed is just a proxy for "big publisher") |
| Page speed = 65.8% of the model | Website identity = 97.5% of the model (speed was just how it detected which site a page was from) |
| Make your site faster | Be on a site that gets cited (not helpful advice) |
๐ฅ EXPERIMENT 2: THE AUTHORITY TRAP (Wrong Again)
OK, so website identity matters more than page features. But what makes a website "trusted" by AI? We thought we knew: authority. The same things that make a website rank well on Google should make it trusted by AI, right?
We collected authority data for 2,478 websites:
- PageRank (Google's original measure of how important a website is)
- Wikipedia links (how many Wikipedia pages link to the website)
- Domain age (how old the website is)
- Backlinks (how many other websites link to it)
- Reddit mentions (how often the website is mentioned on Reddit)
These are the classic signals that SEO professionals have used for 20 years to measure a website's importance.
Every single one went the WRONG direction.
Websites with higher PageRank were LESS likely to be cited by AI. Websites with more Wikipedia links were LESS likely to be cited. Older websites were LESS likely to be cited.
What was actually happening
The "not cited" group in our data was full of well-established websites that rank well on Google but that AI platforms just do not cite. Think: big corporate websites, government directories, university course pages. These sites have high authority scores. They rank in Google's top 10. But AI does not find them useful for answering questions.
Meanwhile, the websites AI DOES cite tend to be specialized review sites, comparison pages, and niche experts. These sites are newer, have fewer backlinks, and would score lower on traditional authority metrics. But they have exactly the kind of information AI needs: specific answers, data, comparisons.
What this taught us: Traditional website authority (PageRank, backlinks, domain age) does not explain AI citation. The old playbook does not apply here.
๐ฅ EXPERIMENT 3: THE PRINCETON PROBLEM (The Famous Study That Didn't Hold Up)
A research group at Princeton published what became the most famous paper in this field. They built a test system and found that adding statistics, citations, and quotations to content could boost AI visibility by up to 40%. This finding shaped the entire industry's approach to AI optimization.
We tried to replicate their results on real AI platforms (ChatGPT, Perplexity, Google AI Mode) instead of the test system they used.
It did not replicate.
We tested their recommendations on 3,205 pages across four real AI platforms. Here is what happened:
- Statistics: This one actually worked. Pages with more numbers, percentages, and data points were more likely to be cited by ChatGPT. Score one for Princeton.
- Citations and quotations: These either did not matter or went the WRONG direction. Pages with more quotes and source citations were actually LESS likely to be cited on some platforms. Score zero for Princeton.
Their test system behaved differently from real AI platforms. The advice the industry built on this paper (add citations! add quotations! add statistics!) was one-third right.
What this taught us: Results from test systems do not automatically transfer to real products. You have to test on the actual platforms people use.
What the statistics told us (in plain English)
We used something called AUC (Area Under the Curve) to measure how well different features predict AI citation. Think of it like a grade:
- 50% = random guessing (flipping a coin)
- 60% = slightly better than guessing (you have a small edge)
- 70% = decent (you can predict meaningfully)
- 80%+ = strong (you can predict reliably)
- 97.5% = almost perfect (what website identity scored)
Princeton's features scored 54.4% on real platforms. That is barely better than flipping a coin. For comparison, just looking at what website a page was on scored 97.5%.
๐ THE BREAKTHROUGH: COMPARING APPLES TO APPLES
After three failed experiments, we realized the core problem: we kept comparing pages from different websites. Every time we did that, the website's reputation overwhelmed everything about the page itself. It was like trying to judge a chef's cooking by comparing a meal at a five-star restaurant to a meal at a food truck. You cannot separate the chef's skill from the restaurant's reputation.
So we changed the experiment design completely.
The new approach: Only compare pages that Google already ranked for the SAME question at SIMILAR positions.
Here is why this works. If Google ranks both Forbes and NerdWallet on page 1 for "best credit card," Google has already decided both pages are relevant and trustworthy enough to rank. Now the question becomes: among these equally-qualified pages, which ones does AI pick?
This is like a cooking competition where all the chefs are equally famous. Now you can actually judge the food.
We call this position-band matching. We group pages into bands based on their Google rank (positions 1 through 3, 4 through 7, 8 through 12, 13 through 20) and only compare pages within the same band.
๐ THE EXPERIMENT: 250 QUESTIONS, 10,138 PAGES, 3 AI PLATFORMS
What we did
Wrote 250 questions covering 10 different topics (technology, health, finance, travel, home improvement, education, food, fitness, legal, shopping) and 5 types of questions (looking for information, looking for the best option, checking if something is worth it, comparing two things, looking for reviews).
Asked all 250 questions to three AI platforms:
- ChatGPT (GPT-5.3)
- Perplexity
- Google AI Mode
Recorded every source they cited in their answers: 10,138 total citations.
Googled (and Bing'd) the same 250 questions and saved the top 20 results for each (5,000 pages total).
Visited all unique pages (10,138 cited, 5,000 Google, 2k+ Bing, deduplicated -- for a total of 10,293) with a bot and measured 66 things about each page: how long it was, how many headings it had, what kind of content it contained, how it was written, what technical features it had.
Compared the winners to the losers within each position band.
What the position gradient looks like
The first thing we checked: does Google rank predict AI citation at all?
| Google Position | Pages | Cited by AI | Citation Rate |
|---|---|---|---|
| 1 through 3 | 339 | 194 | 57.2% |
| 4 through 7 | 625 | 264 | 42.2% |
| 8 through 12 | 939 | 258 | 27.5% |
| 13 through 20 | 1,650 | 223 | 13.5% |
Yes, ranking higher on Google helps. A lot. Pages in the top 3 get cited 57% of the time. Pages ranked 13 through 20 get cited only 13.5% of the time.
But here is the key detail: even at position 1, 31% of pages are NOT cited. And even at positions 13 through 20, some pages still ARE cited. Google rank is the starting line, not the finish line. Something else determines the winner within each group.
๐ THE SIX PAGE FEATURES THAT PREDICT AI CITATION
Within each position band, we tested all 66 features. Six of them predicted citation in ALL FOUR position bands (meaning the pattern held no matter where the page ranked on Google):
1. Comparison structure (strongest signal, d = 0.43)
Pages that compare things ("Product A vs Product B," side-by-side tables, feature comparisons) get cited more. This was the single strongest content signal in the entire study.
And here is the surprise: it works for ALL types of questions, not just comparison questions. Even when someone asks a simple information question or a "best of" question, pages with comparison structure get cited more. Why? We think it is because comparison tables are packed with specific, extractable facts that AI can easily pull out and cite.
What "d = 0.43" means: This is a measure of how big the difference is (called "Cohen's d"). Think of it like this: if you lined up all the cited pages and all the not-cited pages, a d of 0.43 means there is a clear, visible gap between the two groups. Not huge, but definitely not random chance. In social science, anything above 0.3 is considered a "medium" effect.
2. Query term coverage (d = 0.42)
This one is simpler: does your page contain the words people actually search for? If someone asks about "best password manager," does your page literally contain the words "best," "password," and "manager"?
Cited pages cover 100% of the query's key words. Uncited pages cover about 75%. It sounds obvious, but a surprising number of pages rank for a query without actually containing all the query's words (they get there through synonyms, topic authority, or other SEO signals).
3. Subheading depth (d = 0.19)
Cited pages have roughly twice as many subheadings (specifically H3 subheadings, which are the smaller headers within sections). This makes the page scannable and organized. Think of it like a textbook with clear chapter sections versus a wall of text.
4. Word count (d = 0.20)
Cited pages are longer: median 2,150 words versus 1,415 for not-cited pages. But there is a sweet spot. This is not about writing the longest page on the internet. Around 2,000 words seems to be the target. Short pages (under 1,000 words) lack enough substance. Extremely long pages (5,000+) do not help more.
5. Primary source score (d = 0.27)
This measures whether a page CREATES original information (data, test results, analysis) or just COLLECTS information from other sources (quotes, citations, summaries). Pages that produce their own data get cited more. Pages that mainly aggregate other people's work get cited less.
6. Blog/opinion tone (d = -0.30 to -0.37, NEGATIVE)
This is the strongest NEGATIVE signal. Pages written in first person ("I think," "in my experience," "I tested") get cited LESS. AI platforms prefer pages that read like reference material, not personal blogs.
This does not mean your personal experience is worthless. It means the way you write affects whether AI treats your page as a citable source or as an opinion piece. The same information, written in an objective tone ("testing showed that...") versus a personal tone ("I found that..."), gets treated differently.
What about the things everyone tells you matter?
We also checked a bunch of features that SEO experts commonly recommend for AI optimization. Several of them showed no effect at all within position bands:
| Feature | What people say | What our data showed |
|---|---|---|
| Page speed | "Make your site faster!" | Not significant in ANY position band (p > 0.39 everywhere) |
| Author bylines | "Add author bios for trust!" | No consistent effect |
| Content-to-HTML ratio | "Clean up your code!" | Marginal at best |
| Readability scores | "Write at an 8th grade level!" | No effect |
| Review schema | "Add Review markup!" | No effect within bands |
| Product schema | "Add Product markup!" | No effect within bands |
Page speed is the most dramatic reversal. It appeared to be the most important factor in our first experiment (65.8% of the model). After we controlled for which website a page was on, it completely disappeared. Zero effect.
๐งช HOW WE DOUBLE-CHECKED OURSELVES
Good science means checking your work from multiple angles. We ran four additional tests:
Test: What if we add up all the features into one model?
We built the model in layers, adding one category of features at a time:
| What we added | Model accuracy | Improvement |
|---|---|---|
| Just Google position | 70.4% | Starting point |
| + Whether the page matches the query | 71.7% | +1.3% |
| + Page structure (headings, length) | 73.8% | +2.1% (biggest jump) |
| + Content quality (stats, data) | 73.5% | -0.3% (no help) |
| + Trust signals (schema, author) | 74.0% | +0.5% |
| + Technical (speed, page size) | 73.9% | -0.1% (no help) |
Page structure is the most valuable thing you can add beyond just ranking well on Google. Content quality features (statistics, technical depth) do not add anything beyond what structure already captures. This suggests that well-structured pages tend to also be high-quality pages, so the structure is doing double duty.
Test: Does content beat website identity?
This was the big question. When we compare pages within the same Google results, does page content predict citation better than website name?
| What we measured | Accuracy |
|---|---|
| Page content alone | 67.3% |
| Website identity alone | 68.7% |
| Both combined | 69.7% |
For the first time in our entire research program, page content and website identity are roughly equal. In our first experiment (without controlling for Google rank), website identity scored 97.5% and page content was invisible. Once we compare within the same search results, they converge.
In plain English: If you already rank on Google for a question, what your page says matters just as much as what website it is on.
Test: Do all three AI platforms agree?
We looked at the 51 pages that were cited by ALL three AI platforms (ChatGPT, Perplexity, AND Google AI Mode). These "consensus" pages are the gold standard: three completely different AI systems, with different search engines behind them, all independently chose these pages.
| Measurement | Cited by 0 platforms | Cited by 3+ platforms | Difference |
|---|---|---|---|
| Statistics per 1,000 words | 2.2 | 15.3 | 7x more |
| Primary source score | -10.2 | +10.7 | Completely flipped |
| Word count | 1,219 | 2,506 | 2x longer |
| Query term coverage | 75% | 100% | Every word covered |
The consensus pages have 7 times more statistics than uncited pages. Their primary source score completely flips from negative (aggregating others) to positive (producing original data). This is the strongest evidence that content quality genuinely matters: when three independent AI systems all pick the same page, that page is measurably different.
Test: Matched pairs (the closest thing to a fair fight)
For 1,718 pairs of pages (one cited, one not cited, both ranking for the same question at similar Google positions), every major finding held:
- Cited pages had more subheadings (+3.9 on average)
- Cited pages had more comparison structure (+1.0 signals)
- Cited pages were longer (+482 words)
- Cited pages had less blog tone (-2.0 first-person words per 1,000)
- Cited pages had more statistics (+1.0 per 1,000 words)
Same story from every angle.
๐ THE WEBSITE-LEVEL FINDINGS: TOPICAL AUTHORITY IS EVERYTHING
The page-level findings tell you what to do with a specific page. But we also discovered the single biggest factor at the website level.
How many Google searches your site ranks for is the best predictor of AI trust
Among 2,542 websites that appeared in Google's top 20 across our 250 questions:
| How many of our questions the site ranked for | Citation rate |
|---|---|
| 1 question | 33.8% |
| 2 to 3 questions | 72.9% |
| 4 to 7 questions | 87.3% |
| 8 to 15 questions | 95.2% |
| 16+ questions | 100% |
Read that again. If your website ranks in Google's top 20 for just one of our 250 questions, you have a 34% chance of being cited by AI. If you rank for 4 to 7 questions, that jumps to 87%. If you rank for 16 or more, it is 100%.
This is the most important finding in the entire study. AI platforms trust websites that show up across many related searches. Not websites that rank #1 for one search. Websites that rank for LOTS of related searches.
We call this SERP co-occurrence (SERP stands for Search Engine Results Page). The statistical relationship is strong: rho = 0.341, p = 2.6 x 10^-70.
What "p = 2.6 x 10^-70" means: The "p-value" tells you the probability that a result happened by pure luck. A p-value of 0.05 (5% chance of luck) is the standard threshold for "probably real." A p-value of 2.6 x 10^-70 is 0.00000000000000000000000000000000000000000000000000000000000000000000026. This is not luck. This is one of the most statistically significant findings in this entire research program.
But is it just "more pages = more chances"?
A fair criticism: if you rank for 10 questions, you have 10 chances to get cited. Of course your overall citation rate is higher. That is just math, not trust.
We checked this by computing citations PER appearance:
| How often the site shows up | Citations per appearance |
|---|---|
| 1 time | 0.67 |
| 2 to 3 times | 1.23 |
| 4 to 7 times | 1.64 |
| 8+ times | 2.04 |
Websites that show up more often get cited more PER appearance, not just overall. A website that ranks for 8+ questions gets 3x more citations per slot than a website that ranks for just 1 question. This is not just "more lottery tickets." AI platforms genuinely trust websites with broader topical coverage.
The content uniqueness surprise
Here is something nobody expected: cited websites are LESS unique than their competitors, not more.
We measured how different each website's content was from the average content on the same search results page. Websites that got cited had a uniqueness score of 0.454. Websites that did NOT get cited had a score of 0.499 (higher = more different from the pack).
This sounds backwards. Should not unique, original content be more valuable?
The answer reveals a two-stage process:
First, match what everyone else covers. If the search results for "best running shoes" all discuss cushioning, weight, price, and terrain type, your page needs to cover those topics too. Pages that go in a completely different direction get skipped.
Then, add depth and structure. Among pages that cover the basics, the ones that add comparison tables, statistics, and deep subheadings get cited. The extra value is in HOW you cover the topic, not in covering a DIFFERENT topic.
The gateway to AI citation is comprehensive coverage. The differentiator is structure and depth.
๐ WHAT THIS MEANS FOR YOU
At the website level (the bigger lever)
Build a cluster of pages around your topic. If you sell running shoes, do not just have one page about running shoes. Have pages about:
- Best running shoes for beginners
- Running shoes vs walking shoes
- How to choose running shoes for flat feet
- Running shoe cushioning compared
- Trail running shoes vs road running shoes
Each page targets a different question people ask. Interlink them all. The goal: rank in Google's top 20 for as many related searches as possible. This is what AI platforms use as a trust signal.
Websites that rank for 4 or more related questions have an 87%+ AI citation rate. Websites that rank for just 1 question have a 34% rate.
At the page level (the page-specific levers)
Add comparison sections. Even if your page is not specifically a "vs" comparison, include a section that compares options. "Product A vs Product B" tables, feature comparison grids, pros and cons lists. This was the #1 content signal.
Use the exact words people search for, early in the page. If you are targeting "best password manager," those exact words should appear in your first few paragraphs.
Break content into lots of subsections with clear headings. Aim for twice as many subheadings as you think you need. Use H3 headings (the smaller ones within sections) generously.
Include real numbers and data. Statistics, test results, percentages, sample sizes. Pages that all three AI platforms agreed on had 7x more statistics than average.
Write like a reference guide, not a blog post. Avoid "I think," "in my opinion," "I tried this and." Write as if you are writing a Wikipedia article or a product specification. The information can be the same. The tone matters.
Aim for about 2,000 words. Long enough to be comprehensive. Short enough to stay focused.
Add FAQ schema markup. This is a technical tag you add to your page's code that helps AI platforms understand your Q&A sections. It was significant across all four position bands in our study.
What NOT to spend time on
Based on our data, these commonly recommended optimizations showed no effect:
- Making your site faster (once you control for website identity, speed does not matter)
- Adding author bylines
- Improving readability scores
- Adding Review or Product schema markup
- Getting more backlinks or improving domain authority
๐ฌ HOW CONFIDENT ARE WE?
Every scientific study has limitations. Here are ours, stated honestly:
This is observation, not a controlled experiment. We measured what pages look like that get cited. We did not change pages and measure whether citations increased. It is possible that some hidden factor we did not measure explains both the content features and the citation probability. The strongest evidence against this: three independent AI platforms with different search backends show the same patterns, which is hard to explain by a single hidden factor.
250 questions is a sample. Different questions might produce different results. We balanced across 10 topics and 5 question types to reduce this risk, but 250 is still a slice of all possible questions.
AI platforms change. Our data reflects ChatGPT (GPT-5.3), Perplexity, and Google AI Mode as of April 2026. Future model updates may shift citation behavior.
This only applies to pages that already rank on Google. If your page is not in Google's top 20, this study does not tell you how to get there. It tells you what to do once you are there.
The effect sizes are modest. The content features we found have Cohen's d values of 0.19 to 0.43. These are "small to medium" effects. They are real and consistent, but they are not magic switches. No single change will guarantee AI citation. These are the factors that tilt the odds in your favor.
๐ THE FULL RESEARCH
This blog post summarizes the findings from our Experiment M research paper: "I Rank on Page 1: What Gets Me Cited by AI? Position-Controlled Analysis of Page-Level and Domain-Level Predictors of AI Search Citation" (Lee, 2026).
The full paper, dataset (10,293 pages, 66 features, 250 queries, citations from 3 AI platforms), and code are available at DOI: 10.5281/zenodo.19398158.
Prior experiments referenced in this post:
- Query Intent, Not Google Rank: What Best Predicts AI Citation Behavior (Lee, 2026a)
- Reddit Doesn't Get Cited (Through the API) (Lee, 2026b)
- GEO: Generative Engine Optimization (Aggarwal et al., 2024, Princeton)
For a free check of your page against these factors: AI Visibility Quick Check
For the full AI SEO audit: AI SEO Audit Service