Services / AI SEO Audits
Why Aren't Your Client's Pages Getting Cited by AI?
We evaluate pages against confirmed citation predictors from 3,251 real websites (UGC excluded), tested across logistic regression, random forest, and gradient boosting models. Platform-specific checks for ChatGPT, Claude, Perplexity, and Gemini.
The Problem
Why Traditional SEO Audits Are Not Enough
AI platforms evaluate content extractability, query-intent matching, and platform-specific retrieval architectures, not backlinks or click-through rates. An AI SEO audit evaluates the specific page-level features our research identified as statistically significant predictors of citation.
Without this data, your agency is optimizing for signals AI platforms do not use, while the pages that do get cited pull further ahead.
Research-Backed Framework
8 Confirmed Citation Predictors (Clean Data)
Our corrected analysis tested 3,251 real websites (1,457 cited vs 1,794 not-cited) from ChatGPT and Google AI Mode across three independent models: logistic regression, random forest, and gradient boosting. UGC pages (Reddit, YouTube, Facebook) were excluded from the comparison pool after a methodology review found they contaminated 41% of not-cited samples in the original dataset.
Load Time
Strongest Signal r=0.264The strongest confirmed predictor on clean data. Cited pages load at a median of 1,628ms vs. 2,302ms for not-cited pages. The relationship is non-linear: moderate variation does not matter, but pages loading above 4-5 seconds face a penalty cliff. Keep load times under 3 seconds for the best citation odds.
Content-to-HTML Ratio
Confirmed r=0.132Leaner code wins. Pages where more of the HTML is actual content (rather than framework code, scripts, and markup overhead) perform better. Strip unnecessary JavaScript, tracking scripts, and heavy frameworks. The goal is high content density in a small page footprint.
In-Content Link Density
Confirmed r=0.127Fewer in-content links = cited. Cited pages have a median of 9 in-content links vs. 16 for not-cited pages. Focused pages that stay on topic outperform pages that scatter attention across dozens of links. Quality and focus over link volume.
Word Count (Reversed)
Confirmed r=0.107 Direction reversedOn clean data, shorter focused pages are cited more: 1,799 words (cited) vs. 2,114 words (not-cited). This reverses the original finding from the contaminated dataset. The signal favors focused, complete answers over sprawling comprehensive guides. Write enough to fully address the topic, then stop.
External Link Count
Confirmed r=0.097Fewer outbound links = cited. Cited pages have a median of 16 external links vs. 21 for not-cited pages. Pages that keep users on-site rather than sending them elsewhere signal authority and self-sufficiency.
Popups and Modals
Confirmed r=0.077Fewer popups and modals = cited. Interstitials that obstruct content extraction hurt citation odds. AI bots that fetch live HTML may fail to parse content hidden behind modal overlays. Clean, unobstructed content performs better.
Schema TYPE (Not Presence)
Confirmed Product OR=3.09Generic schema presence is not a predictor (p=0.78 on clean data). But schema type matters: Product schema (OR=3.09), Review schema (OR=2.24), and FAQ schema (OR=1.39) all help. Article schema actually hurts (OR=0.76). The takeaway: use the right schema type for your content, but do not expect adding any schema to help by itself.
Author Attribution (Negative)
Confirmed OR=0.81Pages with visible author bylines are cited less often. AI platforms appear to deprioritize opinion-style content in favor of institutional and product pages. This held on clean data with a reduced but still negative effect (OR=0.81 vs. the original 0.632 from contaminated data).
Source: Lee, A. (2026). "Query Intent, Not Google Rank: What Best Predicts AI Citation Behavior." Corrected dataset: 3,251 real websites (UGC excluded) across ChatGPT and Google AI Mode, 3 multivariate models (March 2026). Full research paper.
Deeper Analysis
The Three Pillars of AI Citability
The 8 confirmed predictors cluster into three underlying pillars. Understanding this structure tells you where to focus: optimize for the pillar, and every feature within it improves simultaneously.
Pillar 1: Page Performance
Load Time + Content-to-HTML Ratio + No PopupsLoad time is the strongest confirmed predictor (r=0.264). Cited pages load at 1,628ms median vs. 2,302ms for not-cited. Content-to-HTML ratio (r=0.132) reinforces this: lean pages with high content density outperform bloated pages stuffed with JavaScript frameworks and tracking scripts. Popups and modals (r=0.077) add another drag on citability by obstructing content extraction.
Pillar 2: Content Focus
Word Count (Focused) + Fewer In-Content Links + Fewer External LinksThe clean data reversed the original word count finding. Shorter, focused pages are cited more: 1,799 words (cited) vs. 2,114 (not-cited). In-content links also favor focus: 9 links (cited) vs. 16 (not-cited). External link count follows the same pattern: 16 outbound (cited) vs. 21 (not-cited). The consistent signal is focus over comprehensiveness.
AI platforms appear to favor pages that fully address a specific topic without diluting attention across tangential links and padding content. Write enough to completely answer the query, then stop. Keep in-content links relevant and limited.
Pillar 3: Schema Type
Product / FAQ / Review Help | Article Hurts | Generic Presence IrrelevantGeneric schema presence is not a predictor on clean data (p=0.78, OR=1.02). But schema type matters significantly. Product schema (OR=3.09) has the strongest effect, followed by Review schema (OR=2.24) and FAQ schema (OR=1.39). Article schema actually hurts citation odds (OR=0.76), consistent with the negative author attribution signal.
Author attribution is a separate confirmed negative signal (OR=0.81). Together with Article schema, the pattern suggests AI platforms deprioritize editorial/opinion-style content in favor of institutional, product, and transactional pages.
The pillar framework simplifies the audit. Instead of optimizing 8 features independently, you focus on three strategies: make pages fast and lean (Pillar 1), write focused content with limited links (Pillar 2), and use the right schema types (Pillar 3). Every confirmed predictor improves as a natural consequence of getting these three things right.
Patterns in Practice
The Anti-Pattern vs. The Winning Pattern
Our research reveals two clear page profiles at opposite ends of the citation spectrum. Understanding these patterns makes it easy to diagnose where any given page falls and what needs to change.
The Anti-Pattern: Low Citability
- Slow load time above 3 seconds due to unoptimized images, excessive scripts, or poor hosting
- Bloated HTML with low content-to-code ratio, heavy frameworks, and tracking scripts
- Sprawling 3,000+ word content that tries to cover everything rather than answering a specific query
- 20+ in-content links scattering attention across tangential topics
- Popups and modals that obstruct content extraction by AI bots
- Article schema with author byline, signaling editorial/opinion content that AI platforms deprioritize
This profile fails across all three pillars: slow and bloated (Pillar 1 fails), unfocused content with excessive links (Pillar 2 fails), and wrong schema signals (Pillar 3 fails).
The Winning Pattern: High Citability
- Fast load time under 2 seconds with optimized images, minimal render-blocking resources, and CDN delivery
- Lean HTML with high content-to-code ratio, stripped of unnecessary JavaScript and tracking
- Focused 1,500-2,000 word content that completely addresses a specific topic, then stops
- Limited in-content links (under 10), each one relevant and purposeful
- No popups or modals obstructing content extraction
- Product, FAQ, or Review schema matching the content type, giving AI structured context
This is the profile of a focused, well-optimized page. Fast and lean (Pillar 1 satisfied), focused content with limited links (Pillar 2 satisfied), and the right schema type (Pillar 3 satisfied).
The gap between these two profiles is roughly 20 percentage points in citation rate. That is the difference between being cited in roughly 4 out of 10 relevant AI queries versus 6 out of 10. For agencies managing client visibility, that gap represents a significant competitive advantage.
Most pages fall somewhere between these extremes. Our audit scores each page against both profiles, identifies which pillars are strong and which are weak, and delivers a prioritized action plan to move each page from the anti-pattern toward the winning pattern. The highest-impact changes come first, so agencies can show measurable progress within the first implementation cycle.
Platform Architecture
The 2-vs-2 Divide
AI platforms split into two architectural groups, each requiring different optimization strategies.
Fetching Platforms
ChatGPT + Claude
- Live page fetch during conversations
- Full HTML content matters directly
- ChatGPT discovers via Bing
- Claude discovers via its own search index
- No JavaScript rendering
Index-Only Platforms
Perplexity + Gemini
- No live fetch during conversations
- Search snippets and metadata matter
- Perplexity uses proprietary index
- Gemini uses Google's internal search
- Traditional SEO strength drives visibility
Our audit includes checks most SEO tools miss: Bing Webmaster Tools indexation (for ChatGPT), cross-index visibility verification (for Claude), and snippet quality assessment (for Perplexity/Gemini). For fetching platforms, we test actual bot-received HTML, content position (header content: 96-100% AI visibility; footer: 0-12%), and robots.txt compliance per user agent.
Deliverables
What's Included
- Page-level citation readiness scores -- each page scored against the confirmed citation predictors
- Platform-specific indexation check -- verification across Google, Bing, and other search indexes
- Schema markup audit -- current assessment and GEO-optimized recommendations
- Content structure analysis -- heading hierarchy, word count, content position, content-to-HTML ratio
- Internal linking evaluation -- link density, anchor text quality, hub-and-spoke structure
- robots.txt and crawl access review -- access verified for all major AI user agents
- Prioritized action plan -- recommendations ranked by expected impact with implementation guidance
How It Works
The Technology Behind It
Every recommendation is cross-referenced against our GEO Knowledge Base -- a structured repository of empirically validated optimization strategies, continuously updated with new research. We use a multi-platform scraper to test how pages actually appear to AI bots, verifying server-side rendering and comparing bot-received HTML against human-visible content. Learn more on the toolkit page.
Transparency
Limitations
Keep Going
Related Services
FAQ
Frequently Asked Questions
What are the citation predictors?
After a methodology correction that excluded UGC from the comparison pool, our clean-data analysis of 3,251 real websites confirms 8 predictors: load time (r=0.264, strongest signal), content-to-HTML ratio (r=0.132), in-content link density (r=0.127, fewer = cited), word count (r=0.107, reversed: shorter focused pages win), external link count (r=0.097, fewer = cited), popups/modals (r=0.077, fewer = cited), schema TYPE (Product OR=3.09, FAQ OR=1.39, Review OR=2.24; Article OR=0.76 hurts; generic presence not a predictor), and author attribution (negative, OR=0.81).
How is an AI SEO audit different from a traditional SEO audit?
An AI SEO audit evaluates page-level features that predict AI citation, not backlinks or keyword rankings. It includes platform-specific checks for fetching architectures (ChatGPT, Claude) versus index-only architectures (Perplexity, Gemini), along with bot-received HTML testing and cross-index visibility verification.
How long does an AI SEO audit take?
The timeline depends on site size. Typically, an initial audit takes 1 to 2 weeks and includes page-level scoring against all confirmed citation predictors, platform-specific checks, and a prioritized action plan with implementation guidance.
Find Out What's Holding Your Client Back
Start with a free AI check to see the basics, or share your client's site and we'll scope a full audit against all confirmed citation predictors.