← Back to Blog

AI SEO EXPERIMENTS

We Crawled 479 Pages to Find What AI Platforms Actually Cite - It's Not What SEO Says

2026-03-20

Bar chart showing URL overlap between Google Top-3 results and AI citations: ChatGPT 7.8%, Claude 11.2%, Perplexity 29.7%, Gemini 32.4%

There's an entire industry selling "AI SEO" advice right now. Most of it is recycled Google SEO tips with "AI" bolted on: add author bios, speed up your site, remove popups, build backlinks. I wanted to know if any of it was actually true.

So I built a scraper that queries ChatGPT, Claude, Perplexity, and Gemini, collects every citation they return, and cross-references those citations against Google and Bing's top results. I ran 120 product recommendation queries across all four platforms, crawled 479 cited and non-cited pages to measure 26 technical features, and verified platform behavior with server-side Vercel middleware logs.

The results contradicted almost everything the "AI SEO" industry is currently selling.

The overlap between Google rankings and AI citations is shockingly low

The foundational assumption of AI SEO advice is: "Rank on Google, and AI will cite you." I tested this directly.

For 120 product recommendation queries, I pulled Google's Top-3 organic results via API and compared them against citations from ChatGPT, Claude, Perplexity, and Gemini.

URL-level overlap (exact page match):

Platform Overlap with Google Top-3
ChatGPT 7.8%
Perplexity 29.7%
Gemini 32.4%
Claude 11.2%

ChatGPT cited the same specific page as Google's Top-3 only 7.8% of the time. Even Google's own AI Mode - which is literally built on Google Search - only matched 32.4% of the time.

Domain-level overlap was higher (28.7-49.6%), meaning AI platforms do tend to trust the same domains Google trusts, but they pick different pages from those domains. This is an important distinction: being on a trusted domain helps, but ranking #1 for a keyword does not translate to AI citation.

What actually predicts whether a page gets cited

If Google rank doesn't predict AI citation, what does?

I crawled 479 unique pages - 241 that were cited by at least one AI platform and 238 that weren't - and measured 26 technical features on each page using Playwright. After applying Benjamini-Hochberg FDR correction at α = .05 (to control for the multiple comparison problem), 7 features reached statistical significance:

Feature Cited (median) Not Cited (median) Effect
Word count 2,582 1,859 p_adj = .006
Schema markup count 1.0 1.0 p_adj = .006
Self-referencing canonical 84.2% 73.5% OR = 1.92
Has schema markup 73.9% 62.6% OR = 1.69
Internal link count 123 96 OR = 2.07
Content-to-HTML ratio 0.086 0.065 p_adj = .047
Total link count 164 134 p_adj = .037

Internal links were the strongest single predictor in the logistic regression model (standardized β = 0.73, OR = 2.07). A one-standard-deviation increase in internal links roughly doubled the odds of being cited.

But here's the catch: total link count had a negative coefficient (β = -0.76, OR = 0.47). This means pages with lots of outbound links but few internal links were less likely to be cited. The signal is clear: internal links good, heavy external linking bad.

What does NOT predict AI citation (this is the important part)

This is where the industry advice falls apart. After FDR correction, these commonly recommended "AI SEO" factors showed no statistically significant association with citation:

Factor p_adj What the industry says
Popup/modal elements .606 "Remove popups for AI visibility"
Author attribution .522 "Add author bios for AI trust"
Page load time NS "Speed up your site for AI crawlers"
Page size (bytes) NS "Keep pages lightweight"
Paywall signals NS "Remove paywalls for AI access"
Affiliate link count NS "Affiliate links hurt AI visibility"
Cookie banners NS "Cookie banners confuse AI bots"
Ad network signals NS "Remove ads for better AI ranking"

Author attribution was particularly interesting - cited pages actually showed slightly lower author attribution rates (44.0%) than non-cited pages (48.7%). The popular advice to "add author bylines for AI trust" has zero empirical support in our data.

Popups? No difference. Load time? No difference. Page size? No difference.

I want to be careful here: these null results don't prove these factors never matter. They mean that across 479 pages with proper statistical controls, they didn't reach significance. But they do directly challenge specific pieces of advice that people are paying good money for.

AI platforms don't work the way you think

Part of the reason generic advice fails is that AI platforms have fundamentally different architectures. I verified this with server-side Vercel middleware logs - not by reading documentation, but by watching actual HTTP requests hit my server.

The 2-vs-2 split:

Live-fetch platforms (ChatGPT, Claude): These send real HTTP GET requests to your server during conversations. ChatGPT uses ChatGPT-User/1.0 and discovers URLs through Bing's search index. Claude uses Claude-User/1.0 but only fetches when its training data and search snippets are insufficient - it's demand-driven, not proactive.

Index-only platforms (Perplexity, Gemini): These never contact your server during a conversation. Perplexity serves from a pre-built index maintained by PerplexityBot. Gemini grounds its answers through Google Search infrastructure - we saw zero Gemini-specific fetch requests across 14 days of monitoring.

This means that for ChatGPT and Claude, your actual page HTML matters in real time. For Perplexity and Gemini, what matters is what their crawlers indexed days or weeks ago.

It also means that if your content is JavaScript-rendered and not server-side rendered, Claude will see an empty shell. We verified this: Claude's web_fetch returns only the initial HTML - it cannot execute JavaScript.

Query intent matters more than any page-level feature

Here's the finding that surprised me most.

I classified 19,556 real user queries across 8 verticals (B2B SaaS, Finance, Health, Legal, Local Services, E-commerce, Travel, Education) into intent categories. The intent distributions varied dramatically:

  • Agency/Law: 66% Discovery intent ("find me a lawyer for X")
  • SaaS: 87% Informational intent ("how does X work")
  • Health: Heavy Informational, with high institutional source bias
  • Local Services: 60% Brand/Vendor citations

At the aggregate query level, intent is the strongest predictor of what kinds of sources get cited (Cramér's V = 0.258, χ²(28) = 5,195, p < .001). But at the individual page level, adding intent to the technical feature model provided zero additional predictive power (likelihood ratio test p = .78).

The practical meaning: there are two levels to the game. First, your content must match the intent profile AI platforms are looking for in your vertical. A comparison page will never get cited for an informational query, no matter how technically optimized it is. Second, once you're in the right intent pool, the 7 technical features determine which specific page gets selected.

Most "AI SEO" advice only addresses the second level and ignores the first entirely.

Platform citation behavior is wildly different

I also measured how consistently each platform cites sources, and how much they differ from each other:

Platform Citation Rate Avg Citations/Query Top-1 Consistency Reddit Rate (Web UI)
Google AI Mode 98% 17.4 48% 48%
Perplexity 97% 9.8 40% 64%
ChatGPT 56% 3.5 70% 27%
Claude 39% 5.5 38% 0%

Some things that jumped out:

  • ChatGPT is the most consistent (70% Top-1 consistency) but cites the fewest sources (3.5 per query) and only cites at all 56% of the time.
  • Google AI Mode cites everything - 17.4 sources per query, 98% citation rate. It's the volume leader by far.
  • Claude never cited Reddit. Zero. On any platform, Reddit citations drop to zero when accessed through the API (vs web UI). This is covered in depth in our companion paper.
  • Perplexity has the strongest freshness bias - its cited content averages 32 days old for medium-velocity topics, versus 108 days for Google's top results. This creates a "Lazy Gap" where fresh content can earn AI citations before it would rank on Google.

What I'd actually recommend

Based on the data, not vibes:

  1. Check whether your content matches the intent profile for your vertical. Run queries your customers would ask through ChatGPT and Perplexity. Look at what type of content gets cited (product pages? comparison guides? forums?). If you're producing the wrong content type, no amount of technical optimization will help.
  2. Fix the technical basics that actually matter. Internal linking (strong positive signal), self-referencing canonicals, schema markup, and content-to-HTML ratio. These are cheap to implement and have real effect sizes.
  3. Stop spending money on things that don't matter for AI. Author bios, popup removal, and page speed optimizations are fine for Google SEO and user experience. But if someone is selling them as "AI SEO" specifically, the data doesn't support that.
  4. Optimize for each platform separately. If ChatGPT is important to you, make sure Bing has indexed your content (ChatGPT uses Bing for URL discovery). If Perplexity is important, keep your content fresh - the Lazy Gap is real and exploitable. If Google AI Mode matters, standard Google SEO is sufficient since it's built on Google Search.
  5. Server-side render everything. Claude can't execute JavaScript. If your content is behind a client-side framework, Claude sees nothing. ChatGPT handles some JS, but you're gambling.

Limitations

I want to be upfront about what this study can and can't tell you:

  • Sample size of 479 pages is decent for identifying medium-to-large effects but may miss smaller ones. The null results for author bios, popups, etc. should be read as "no detectable effect at this sample size," not "definitively zero effect."
  • The data is from January-March 2026. AI platforms change their behavior frequently. Findings may not hold in 6 months.
  • This was commercial/product query focused. Citation behavior for academic, news, or purely informational queries may differ.
  • Logistic regression AUC was 0.594. Significantly above chance, but not a crystal ball. There are clearly factors we didn't measure (domain authority, content semantic quality, etc.) that matter.
  • I run a business in this space. I built tools for AI visibility monitoring. I've tried to let the data speak and be transparent about methodology, but you should factor that in.

The paper and data

The full methodology, statistical tables, and supplementary analysis are in the research paper:

"Query Intent, Not Google Rank: What Best Predicts AI Citation Behavior"
Lee, A. (2026). Preprint v3.
DOI: 10.5281/zenodo.18653093
PDF: aiplusautomation.com/research/query-intent-ai-citation.pdf

The companion paper on Reddit's role in AI recommendations:
"Reddit Doesn't Get Cited (Through the API)"
DOI: 10.5281/zenodo.18679003

Happy to answer questions about methodology, findings, or the tools I built to collect this data.

Want to Apply This to Your Client's Site?

We help agencies turn AI search research into actionable optimization. Start with a free AI visibility check or tell us about your client.

Free AI Check Start a Conversation
Our Services Published Research More Articles