Services / AI SEO Audits

Why Aren't Your Client's Pages Getting Cited by AI?

We evaluate pages against confirmed citation predictors from 3,251 real websites (UGC excluded), tested across logistic regression, random forest, and gradient boosting models. Platform-specific checks for ChatGPT, Claude, Perplexity, and Gemini.

6.8% ChatGPT / Google overlap
3,251 Real websites analyzed (UGC excluded)
8 Confirmed citation predictors

Why Traditional SEO Audits Are Not Enough

Ranking #1 on Google does not guarantee AI citation. Our published research found just 6.8% URL overlap between Google's top results and ChatGPT citations.

AI platforms evaluate content extractability, query-intent matching, and platform-specific retrieval architectures, not backlinks or click-through rates. An AI SEO audit evaluates the specific page-level features our research identified as statistically significant predictors of citation.

Without this data, your agency is optimizing for signals AI platforms do not use, while the pages that do get cited pull further ahead.

8 Confirmed Citation Predictors (Clean Data)

Our corrected analysis tested 3,251 real websites (1,457 cited vs 1,794 not-cited) from ChatGPT and Google AI Mode across three independent models: logistic regression, random forest, and gradient boosting. UGC pages (Reddit, YouTube, Facebook) were excluded from the comparison pool after a methodology review found they contaminated 41% of not-cited samples in the original dataset.

Methodology Correction (March 2026): Our initial expanded dataset (4,658 pages) had a comparison pool contamination issue: 41% of not-cited pages were UGC (Reddit threads, YouTube pages, Facebook posts), not real business websites. Many findings collapsed when comparing real websites to real websites only. The 8 predictors below are the ones that held on clean data. We publish this correction because honest methodology matters more than impressive-sounding numbers. Full details in the research paper.
1

Load Time

Strongest Signal r=0.264

The strongest confirmed predictor on clean data. Cited pages load at a median of 1,628ms vs. 2,302ms for not-cited pages. The relationship is non-linear: moderate variation does not matter, but pages loading above 4-5 seconds face a penalty cliff. Keep load times under 3 seconds for the best citation odds.

2

Content-to-HTML Ratio

Confirmed r=0.132

Leaner code wins. Pages where more of the HTML is actual content (rather than framework code, scripts, and markup overhead) perform better. Strip unnecessary JavaScript, tracking scripts, and heavy frameworks. The goal is high content density in a small page footprint.

3

In-Content Link Density

Confirmed r=0.127

Fewer in-content links = cited. Cited pages have a median of 9 in-content links vs. 16 for not-cited pages. Focused pages that stay on topic outperform pages that scatter attention across dozens of links. Quality and focus over link volume.

4

Word Count (Reversed)

Confirmed r=0.107 Direction reversed

On clean data, shorter focused pages are cited more: 1,799 words (cited) vs. 2,114 words (not-cited). This reverses the original finding from the contaminated dataset. The signal favors focused, complete answers over sprawling comprehensive guides. Write enough to fully address the topic, then stop.

5

External Link Count

Confirmed r=0.097

Fewer outbound links = cited. Cited pages have a median of 16 external links vs. 21 for not-cited pages. Pages that keep users on-site rather than sending them elsewhere signal authority and self-sufficiency.

6

Popups and Modals

Confirmed r=0.077

Fewer popups and modals = cited. Interstitials that obstruct content extraction hurt citation odds. AI bots that fetch live HTML may fail to parse content hidden behind modal overlays. Clean, unobstructed content performs better.

7

Schema TYPE (Not Presence)

Confirmed Product OR=3.09

Generic schema presence is not a predictor (p=0.78 on clean data). But schema type matters: Product schema (OR=3.09), Review schema (OR=2.24), and FAQ schema (OR=1.39) all help. Article schema actually hurts (OR=0.76). The takeaway: use the right schema type for your content, but do not expect adding any schema to help by itself.

8

Author Attribution (Negative)

Confirmed OR=0.81

Pages with visible author bylines are cited less often. AI platforms appear to deprioritize opinion-style content in favor of institutional and product pages. This held on clean data with a reduced but still negative effect (OR=0.81 vs. the original 0.632 from contaminated data).

Source: Lee, A. (2026). "Query Intent, Not Google Rank: What Best Predicts AI Citation Behavior." Corrected dataset: 3,251 real websites (UGC excluded) across ChatGPT and Google AI Mode, 3 multivariate models (March 2026). Full research paper.

The Three Pillars of AI Citability

The 8 confirmed predictors cluster into three underlying pillars. Understanding this structure tells you where to focus: optimize for the pillar, and every feature within it improves simultaneously.

Pillar 1: Page Performance

Load Time + Content-to-HTML Ratio + No Popups

Load time is the strongest confirmed predictor (r=0.264). Cited pages load at 1,628ms median vs. 2,302ms for not-cited. Content-to-HTML ratio (r=0.132) reinforces this: lean pages with high content density outperform bloated pages stuffed with JavaScript frameworks and tracking scripts. Popups and modals (r=0.077) add another drag on citability by obstructing content extraction.

Practical target: Load time under 3 seconds. Strip unnecessary JavaScript, remove heavy tracking scripts, use modern image formats (WebP, AVIF), minimize render-blocking resources, eliminate interstitials, and leverage a CDN. The winning profile is a lean, fast, unobstructed page.

Pillar 2: Content Focus

Word Count (Focused) + Fewer In-Content Links + Fewer External Links

The clean data reversed the original word count finding. Shorter, focused pages are cited more: 1,799 words (cited) vs. 2,114 (not-cited). In-content links also favor focus: 9 links (cited) vs. 16 (not-cited). External link count follows the same pattern: 16 outbound (cited) vs. 21 (not-cited). The consistent signal is focus over comprehensiveness.

AI platforms appear to favor pages that fully address a specific topic without diluting attention across tangential links and padding content. Write enough to completely answer the query, then stop. Keep in-content links relevant and limited.

Practical target: Write focused content (1,500-2,000 words is a reasonable range). Limit in-content links to the most relevant. Minimize outbound links. The winning page is a focused, complete answer rather than an encyclopedic overview.

Pillar 3: Schema Type

Product / FAQ / Review Help | Article Hurts | Generic Presence Irrelevant

Generic schema presence is not a predictor on clean data (p=0.78, OR=1.02). But schema type matters significantly. Product schema (OR=3.09) has the strongest effect, followed by Review schema (OR=2.24) and FAQ schema (OR=1.39). Article schema actually hurts citation odds (OR=0.76), consistent with the negative author attribution signal.

Author attribution is a separate confirmed negative signal (OR=0.81). Together with Article schema, the pattern suggests AI platforms deprioritize editorial/opinion-style content in favor of institutional, product, and transactional pages.

Practical target: Use Product, FAQ, or Review schema where appropriate. Avoid relying on Article schema for pages where you want citation. Do not expect adding any schema type to help by itself. Match the schema type to the page content.

The pillar framework simplifies the audit. Instead of optimizing 8 features independently, you focus on three strategies: make pages fast and lean (Pillar 1), write focused content with limited links (Pillar 2), and use the right schema types (Pillar 3). Every confirmed predictor improves as a natural consequence of getting these three things right.

The Anti-Pattern vs. The Winning Pattern

Our research reveals two clear page profiles at opposite ends of the citation spectrum. Understanding these patterns makes it easy to diagnose where any given page falls and what needs to change.

The Anti-Pattern: Low Citability

  • Slow load time above 3 seconds due to unoptimized images, excessive scripts, or poor hosting
  • Bloated HTML with low content-to-code ratio, heavy frameworks, and tracking scripts
  • Sprawling 3,000+ word content that tries to cover everything rather than answering a specific query
  • 20+ in-content links scattering attention across tangential topics
  • Popups and modals that obstruct content extraction by AI bots
  • Article schema with author byline, signaling editorial/opinion content that AI platforms deprioritize

This profile fails across all three pillars: slow and bloated (Pillar 1 fails), unfocused content with excessive links (Pillar 2 fails), and wrong schema signals (Pillar 3 fails).

The Winning Pattern: High Citability

  • Fast load time under 2 seconds with optimized images, minimal render-blocking resources, and CDN delivery
  • Lean HTML with high content-to-code ratio, stripped of unnecessary JavaScript and tracking
  • Focused 1,500-2,000 word content that completely addresses a specific topic, then stops
  • Limited in-content links (under 10), each one relevant and purposeful
  • No popups or modals obstructing content extraction
  • Product, FAQ, or Review schema matching the content type, giving AI structured context

This is the profile of a focused, well-optimized page. Fast and lean (Pillar 1 satisfied), focused content with limited links (Pillar 2 satisfied), and the right schema type (Pillar 3 satisfied).

The gap between these two profiles is roughly 20 percentage points in citation rate. That is the difference between being cited in roughly 4 out of 10 relevant AI queries versus 6 out of 10. For agencies managing client visibility, that gap represents a significant competitive advantage.

Most pages fall somewhere between these extremes. Our audit scores each page against both profiles, identifies which pillars are strong and which are weak, and delivers a prioritized action plan to move each page from the anti-pattern toward the winning pattern. The highest-impact changes come first, so agencies can show measurable progress within the first implementation cycle.

The 2-vs-2 Divide

AI platforms split into two architectural groups, each requiring different optimization strategies.

Fetching Platforms

ChatGPT + Claude

  • Live page fetch during conversations
  • Full HTML content matters directly
  • ChatGPT discovers via Bing
  • Claude discovers via its own search index
  • No JavaScript rendering

Index-Only Platforms

Perplexity + Gemini

  • No live fetch during conversations
  • Search snippets and metadata matter
  • Perplexity uses proprietary index
  • Gemini uses Google's internal search
  • Traditional SEO strength drives visibility

Our audit includes checks most SEO tools miss: Bing Webmaster Tools indexation (for ChatGPT), cross-index visibility verification (for Claude), and snippet quality assessment (for Perplexity/Gemini). For fetching platforms, we test actual bot-received HTML, content position (header content: 96-100% AI visibility; footer: 0-12%), and robots.txt compliance per user agent.

What's Included

  • Page-level citation readiness scores -- each page scored against the confirmed citation predictors
  • Platform-specific indexation check -- verification across Google, Bing, and other search indexes
  • Schema markup audit -- current assessment and GEO-optimized recommendations
  • Content structure analysis -- heading hierarchy, word count, content position, content-to-HTML ratio
  • Internal linking evaluation -- link density, anchor text quality, hub-and-spoke structure
  • robots.txt and crawl access review -- access verified for all major AI user agents
  • Prioritized action plan -- recommendations ranked by expected impact with implementation guidance

The Technology Behind It

Every recommendation is cross-referenced against our GEO Knowledge Base -- a structured repository of empirically validated optimization strategies, continuously updated with new research. We use a multi-platform scraper to test how pages actually appear to AI bots, verifying server-side rendering and comparing bot-received HTML against human-visible content. Learn more on the toolkit page.

Limitations

Probabilistic, not deterministic. Optimizing for all consensus features does not guarantee citation. AI source selection involves proprietary trust scores, real-time query context, and diversity algorithms we cannot fully model.
Correlational research. Features were identified through observational analysis with FDR-corrected p-values, not randomized experiments. Some features may be proxies for unmeasured variables.
Algorithms change. AI platforms update content selection strategies regularly. Our methodology reflects best practices as of March 2026 and is updated as new data arrives.
Scope varies by site size. For 1,000+ page sites, we audit a representative sample of priority pages. Coverage is defined during engagement scoping.

Frequently Asked Questions

What are the citation predictors?

After a methodology correction that excluded UGC from the comparison pool, our clean-data analysis of 3,251 real websites confirms 8 predictors: load time (r=0.264, strongest signal), content-to-HTML ratio (r=0.132), in-content link density (r=0.127, fewer = cited), word count (r=0.107, reversed: shorter focused pages win), external link count (r=0.097, fewer = cited), popups/modals (r=0.077, fewer = cited), schema TYPE (Product OR=3.09, FAQ OR=1.39, Review OR=2.24; Article OR=0.76 hurts; generic presence not a predictor), and author attribution (negative, OR=0.81).

How is an AI SEO audit different from a traditional SEO audit?

An AI SEO audit evaluates page-level features that predict AI citation, not backlinks or keyword rankings. It includes platform-specific checks for fetching architectures (ChatGPT, Claude) versus index-only architectures (Perplexity, Gemini), along with bot-received HTML testing and cross-index visibility verification.

How long does an AI SEO audit take?

The timeline depends on site size. Typically, an initial audit takes 1 to 2 weeks and includes page-level scoring against all confirmed citation predictors, platform-specific checks, and a prioritized action plan with implementation guidance.

Find Out What's Holding Your Client Back

Start with a free AI check to see the basics, or share your client's site and we'll scope a full audit against all confirmed citation predictors.

Free AI Check Start a Conversation