Services / AI SEO Audits

Why Aren't Your Client's Pages Getting Cited by AI?

We evaluate pages against confirmed citation predictors from 3,251 real websites (UGC excluded), tested across logistic regression, random forest, and gradient boosting models. Platform-specific checks for ChatGPT, Claude, Perplexity, and Gemini.

6.8% ChatGPT / Google overlap
3,251 Real websites analyzed (UGC excluded)
8 Confirmed citation predictors

Why Traditional SEO Audits Are Not Enough

Ranking #1 on Google does not guarantee AI citation. Our published research found just 6.8% URL overlap between Google's top results and ChatGPT citations.

AI platforms evaluate content extractability, query-intent matching, and platform-specific retrieval architectures, not backlinks or click-through rates. An AI SEO audit evaluates the specific page-level features our research identified as statistically significant predictors of citation.

Without this data, your agency is optimizing for signals AI platforms do not use, while the pages that do get cited pull further ahead.

8 Confirmed Citation Predictors (Clean Data)

Our corrected analysis tested 3,251 real websites (1,457 cited vs 1,794 not-cited) from ChatGPT and Google AI Mode across three independent models: logistic regression, random forest, and gradient boosting. UGC pages (Reddit, YouTube, Facebook) were excluded from the comparison pool after a methodology review found they contaminated 41% of not-cited samples in the original dataset.

Methodology Correction (March 2026): Our initial expanded dataset (4,658 pages) had a comparison pool contamination issue: 41% of not-cited pages were UGC (Reddit threads, YouTube pages, Facebook posts), not real business websites. Many findings collapsed when comparing real websites to real websites only. The 8 predictors below are the ones that held on clean data. We publish this correction because honest methodology matters more than impressive-sounding numbers. Full details in the research paper.
1

Load Time

Floor Effect Only Corrected

Our initial analysis showed r=0.264, but our follow-up Experiment K (n=4,350 pages, 2,510 domains) proved this was a domain identity proxy: the model was fingerprinting publisher domains by their CDN speed signatures. Within the same domain, cited pages are actually slower (r=-0.221, p<0.000001). The only valid speed advice: pages loading above 5 seconds see citation rates drop from ~60% to 35%. Below that threshold, speed has no measurable effect on AI citation.

2

Content-to-HTML Ratio

Confirmed r=0.132

Leaner code wins. Pages where more of the HTML is actual content (rather than framework code, scripts, and markup overhead) perform better. Strip unnecessary JavaScript, tracking scripts, and heavy frameworks. The goal is high content density in a small page footprint.

3

Internal Link Structure

Updated (Exp. M) 3.7% importance

Experiment M (position-band matched): internal links have positive direction (3.7% importance, rank #9, significant in 3 of 4 position bands). More internal links help AI platforms understand your content within site context. Prior "fewer = better" finding was confounded by domain identity.

4

Content Depth (~2,000 Words)

Updated (Exp. M) 7.1% importance

Experiment M (position-band matched): within the SERP, cited pages are consistently longer -- median ~2,000 vs ~1,400 words (+42-52%). Target approximately 2,000 words of substantive content. Prior "shorter is better" finding (1,799 vs 2,114) was confounded by domain identity. This is the 3rd most important actionable feature.

5

External Link Count

Weak Signal 2.5% importance

Experiment M: external links have slightly positive direction (2.5% importance, rank #14). The prior "fewer = better" finding (r=0.097) was confounded by domain identity. External link count is not a strong signal either way. Do not prioritize reducing outbound links -- focus on content structure instead.

6

Popups and Modals

Confirmed r=0.077

Fewer popups and modals = cited. Interstitials that obstruct content extraction hurt citation odds. AI bots that fetch live HTML may fail to parse content hidden behind modal overlays. Clean, unobstructed content performs better.

7

Schema TYPE (Not Presence)

Confirmed Product OR=3.09

Generic schema presence is not a predictor (p=0.78 on clean data). But schema type matters: Product schema (OR=3.09), Review schema (OR=2.24), and FAQ schema (OR=1.39) all help. Article schema actually hurts (OR=0.76). The takeaway: use the right schema type for your content, but do not expect adding any schema to help by itself.

8

Author Attribution (Negative)

Confirmed OR=0.81

Pages with visible author bylines are cited less often. AI platforms appear to deprioritize opinion-style content in favor of institutional and product pages. This held on clean data with a reduced but still negative effect (OR=0.81 vs. the original 0.632 from contaminated data).

Source: Lee, A. (2026). "Query Intent, Not Google Rank: What Best Predicts AI Citation Behavior." Corrected dataset: 3,251 real websites (UGC excluded) across ChatGPT and Google AI Mode, 3 multivariate models (March 2026). Full research paper.

The Three Pillars of AI Citability

Experiment M (n=10,293, position-band matched) reorganizes optimization around three pillars ranked by marginal model lift. Content structure provides the largest lift; technical factors are irrelevant.

Pillar 1: Content Structure

Largest marginal lift (+0.021 AUC)

Content structure provides the single largest lift beyond Google position. H3 subheading depth (5.9% importance, d=0.19-0.40) is the 5th most important actionable feature -- cited pages have 2x the H3 count (median 9-10 vs 4-5). Word count (7.1% importance) shows cited pages are ~2,000 words vs ~1,400 within position bands (+42-52%). Prior "shorter is better" was confounded. Comparison structure (6.4% importance, d=0.43) is the #1 actionable content signal -- pages with "vs", comparison tables, and side-by-side analyses are substantially more likely to be cited.

Practical target: Target ~2,000 words with 8-10 H3 subheadings. Add comparison sections where relevant ("vs" framing, comparison tables). Structure content so each H3 is a distinct, citable extraction point.

Pillar 2: Content Quality & Tone

Primary Source Score + No Blog Tone + Statistics

First-person density is the strongest negative signal (8.1% importance). Blog/opinion tone ("I", "my", "we") reduces citation probability. AI platforms preferentially cite pages that read as authoritative reference material. Primary source score (4.1%) measures whether a page produces original data or aggregates others' content -- pages cited by 3+ platforms average +10.7, uncited average -10.2. Statistics density (2.5%) shows pages cited by 3+ platforms have 7x the data points.

Practical target: Write in third person as authoritative reference content. Include original statistics, data points, and test results. Minimize citing/quoting others in favor of producing your own findings. FAQ schema is the only schema type consistently significant across all 4 position bands.

Pillar 3: Semantic Relevance

Query Term Coverage + Answer Position

Query term coverage (d=0.42, 3.7% importance): cited pages contain 100% of the query's content words (median) vs 83-87% for not-cited. Query term first position (3.8% importance): pages where query terms appear in the first 1-2% are more cited. AI platforms preferentially cite pages that directly and immediately address the query.

Note: load time is confirmed NOT significant in any position band (p > 0.39 in all 4 bands, Experiment M). Author attribution is inconsistent (0.2% importance). Content-to-HTML ratio is a modest signal (3.4%). Do not prioritize these over the content signals above.

Practical target: Use Product, FAQ, or Review schema where appropriate. Avoid relying on Article schema for pages where you want citation. Do not expect adding any schema type to help by itself. Match the schema type to the page content.

The pillar framework simplifies the audit. Instead of optimizing features independently, you focus on three strategies: structure content with deep headings and comparison elements (Pillar 1), write as an authoritative primary source with data (Pillar 2), and directly address the query early (Pillar 3). For the first time, content features (AUC=0.631) beat domain identity (AUC=0.583) when comparing equally-ranked pages.

The Anti-Pattern vs. The Winning Pattern

Our research reveals two clear page profiles at opposite ends of the citation spectrum. Understanding these patterns makes it easy to diagnose where any given page falls and what needs to change.

The Anti-Pattern: Low Citability

  • Extremely slow load time above 5 seconds due to unoptimized images, excessive scripts, or poor hosting (the only speed threshold that affects citation)
  • Bloated HTML with low content-to-code ratio, heavy frameworks, and tracking scripts
  • Sprawling 3,000+ word content that tries to cover everything rather than answering a specific query
  • 20+ in-content links scattering attention across tangential topics
  • Popups and modals that obstruct content extraction by AI bots
  • Article schema with author byline, signaling editorial/opinion content that AI platforms deprioritize

This profile fails across all three pillars: slow and bloated (Pillar 1 fails), unfocused content with excessive links (Pillar 2 fails), and wrong schema signals (Pillar 3 fails).

The Winning Pattern: High Citability

  • Accessible load time under 5 seconds (speed below this has no measurable effect on citation, but pages above 5s see a significant drop)
  • Lean HTML with high content-to-code ratio, stripped of unnecessary JavaScript and tracking
  • Focused 1,500-2,000 word content that completely addresses a specific topic, then stops
  • Limited in-content links (under 10), each one relevant and purposeful
  • No popups or modals obstructing content extraction
  • Product, FAQ, or Review schema matching the content type, giving AI structured context

This is the profile of a focused, well-optimized page. Fast and lean (Pillar 1 satisfied), focused content with limited links (Pillar 2 satisfied), and the right schema type (Pillar 3 satisfied).

The gap between these two profiles is roughly 20 percentage points in citation rate. That is the difference between being cited in roughly 4 out of 10 relevant AI queries versus 6 out of 10. For agencies managing client visibility, that gap represents a significant competitive advantage.

Most pages fall somewhere between these extremes. Our audit scores each page against both profiles, identifies which pillars are strong and which are weak, and delivers a prioritized action plan to move each page from the anti-pattern toward the winning pattern. The highest-impact changes come first, so agencies can show measurable progress within the first implementation cycle.

The 2-vs-2 Divide

AI platforms split into two architectural groups, each requiring different optimization strategies.

Fetching Platforms

ChatGPT + Claude

  • Live page fetch during conversations
  • Full HTML content matters directly
  • ChatGPT discovers via Bing
  • Claude discovers via its own search index
  • No JavaScript rendering

Index-Only Platforms

Perplexity + Gemini

  • No live fetch during conversations
  • Search snippets and metadata matter
  • Perplexity uses proprietary index
  • Gemini uses Google's internal search
  • Traditional SEO strength drives visibility

Our audit includes checks most SEO tools miss: Bing Webmaster Tools indexation (for ChatGPT), cross-index visibility verification (for Claude), and snippet quality assessment (for Perplexity/Gemini). For fetching platforms, we test actual bot-received HTML, content position (header content: 96-100% AI visibility; footer: 0-12%), and robots.txt compliance per user agent.

What's Included

  • Page-level citation readiness scores -- each page scored against the confirmed citation predictors
  • Platform-specific indexation check -- verification across Google, Bing, and other search indexes
  • Schema markup audit -- current assessment and GEO-optimized recommendations
  • Content structure analysis -- heading hierarchy, word count, content position, content-to-HTML ratio
  • Internal linking evaluation -- link density, anchor text quality, hub-and-spoke structure
  • robots.txt and crawl access review -- access verified for all major AI user agents
  • Prioritized action plan -- recommendations ranked by expected impact with implementation guidance

The Technology Behind It

Every recommendation is cross-referenced against our GEO Knowledge Base -- a structured repository of empirically validated optimization strategies, continuously updated with new research. We use a multi-platform scraper to test how pages actually appear to AI bots, verifying server-side rendering and comparing bot-received HTML against human-visible content. Learn more on the toolkit page.

Limitations

Probabilistic, not deterministic. Optimizing for all consensus features does not guarantee citation. AI source selection involves proprietary trust scores, real-time query context, and diversity algorithms we cannot fully model.
Correlational research. Features were identified through observational analysis with FDR-corrected p-values, not randomized experiments. Some features may be proxies for unmeasured variables.
Algorithms change. AI platforms update content selection strategies regularly. Our methodology reflects best practices as of March 2026 and is updated as new data arrives.
Scope varies by site size. For 1,000+ page sites, we audit a representative sample of priority pages. Coverage is defined during engagement scoping.

Frequently Asked Questions

What are the citation predictors?

Experiment M (n=10,293 pages, position-band matched) is our most rigorous study. Top actionable signals by importance: first-person density (8.1%, avoid blog tone), word count (7.1%, target ~2,000), comparison structure (6.4%, add vs/comparison tables), H3 depth (5.9%, target 8-10 H3s), primary source score (4.1%), query term coverage (3.7%), internal links (3.7%, positive direction), content-to-HTML ratio (3.4%), statistics density (2.5%). FAQ is the only schema type consistently significant across all 4 position bands. Load time is NOT significant (p > 0.39 all bands). Content features beat domain identity within the SERP for the first time (AUC 0.631 vs 0.583).

How is an AI SEO audit different from a traditional SEO audit?

An AI SEO audit evaluates page-level features that predict AI citation, not backlinks or keyword rankings. It includes platform-specific checks for fetching architectures (ChatGPT, Claude) versus index-only architectures (Perplexity, Gemini), along with bot-received HTML testing and cross-index visibility verification.

How long does an AI SEO audit take?

The timeline depends on site size. Typically, an initial audit takes 1 to 2 weeks and includes page-level scoring against all confirmed citation predictors, platform-specific checks, and a prioritized action plan with implementation guidance.

Find Out What's Holding Your Client Back

Start with a free AI check to see the basics, or share your client's site and we'll scope a full audit against all confirmed citation predictors.

Free AI Check Get a Free Video Audit