Your competitors may be invisible on Google and dominant in AI search. Or the reverse. The only way to know is to audit every platform separately, because only 1.4% of cited URLs overlap across ChatGPT, Perplexity, Claude, and Gemini for the same query (Lee, 2026). An AI competitive audit is not optional. It is the foundation of any AI visibility strategy.
Most competitive intelligence still stops at Google. But research on 19,556 queries found that Google rank has near-zero correlation with AI citation (rho = -0.02 to 0.11 across platforms; Lee, 2026). A competitor ranking on page two of Google could be the top-recommended brand in ChatGPT. A competitor dominating Google's top three could be invisible in Perplexity. Without an AI-specific competitive audit, you are flying blind in the channel reshaping how buyers discover and choose vendors.
This guide gives you six steps, four platforms, a reusable scraping framework, and a template audit report. Every claim is backed by published research. For the data behind this methodology, see our AI citation research roundup and cross-platform citation comparison.
🔑 THE KEY INSIGHT (FRONT-LOADED)
Here is the single most important thing to understand before you begin: AI competitive intelligence requires multi-platform measurement because each platform retrieves from a fundamentally different source pool.
Lee (2026) tested 19,556 queries across ChatGPT, Claude, Perplexity, and Gemini. The Jaccard similarity of cited URLs across platforms was 0.014. That means 98.6% of the URLs cited by one platform were not cited by any other platform for the same query.
| Metric | Value | Implication for Competitive Audits |
|---|---|---|
| Cross-platform URL overlap | 1.4% | Must test every platform separately |
| Within-platform consistency (ChatGPT) | 61.9% | Need multiple sessions per query per platform |
| Google rank correlation with AI citation | rho = -0.02 to 0.11 | Google rankings do not predict AI visibility |
| Reddit shadow corpus correlation | rho = 0.554 | Reddit sentiment shapes brand recommendations invisibly |
The Bottom Line: A "competitive audit" that only checks one AI platform captures less than 2% of the full picture. A competitive audit that only checks Google captures almost none of the AI picture.
📋 STEP 1: IDENTIFY YOUR COMPETITOR SET
The first step is defining who you are auditing. This sounds obvious, but AI search surfaces a different competitive set than Google search.
In traditional SEO, your competitors are the domains ranking for your target keywords. In AI search, your competitors are the brands that AI platforms recommend or cite. These two sets often diverge. A brand with minimal SEO presence can dominate AI recommendations if their content matches the right query intent, Reddit communities consistently upvote them (rho = 0.554; Lee, 2026), or their pages have the structural features that predict citation.
Start with three tiers:
| Tier | Definition | How to Identify |
|---|---|---|
| Known competitors | Brands you already compete against in your market | Internal knowledge, sales team input |
| SEO competitors | Domains ranking for your target keywords | Traditional keyword tracking tools |
| AI-emergent competitors | Brands appearing in AI responses to your target queries | Run 10 to 15 discovery queries across platforms and record every brand mentioned |
The third tier is the one most teams miss. Run broad queries like "best [your category] for [use case]" across all four platforms and record every brand mentioned. Some will surprise you.
The Bottom Line: Your AI competitor set is not the same as your Google competitor set. Discover it empirically by querying each platform before you formalize the audit.
🎯 STEP 2: MAP TARGET QUERIES BY INTENT
Not all queries produce the same citation behavior. The research shows that query intent is the strongest aggregate predictor of which sources get cited (Lee, 2026). Your audit queries need to span every intent type your audience uses.
The Five Intent Categories
Lee (2026) identified five intent categories with distinct citation patterns:
| Intent Type | Share of Queries | Typical Citation Sources | Example Query |
|---|---|---|---|
| Informational | Informational (61.3% of real-world autocomplete queries, though our citation experiments used a balanced 20% per intent design) | Wikipedia, .gov/.edu, tutorials | "how does [technology] work" |
| Discovery | 31.2% of autocomplete queries | Review aggregators, YouTube, listicles | "best [category] for [use case]" |
| Validation | 3.2% | Brand sites, Reddit (web UI) | "is [brand] worth it" |
| Comparison | 2.3% | Publisher/media, review sites | "[brand A] vs [brand B]" |
| Review-seeking | 2.0% | YouTube, tech review sites, Reddit | "[brand] reviews 2026" |
Building Your Query Matrix
For each competitor, create queries across all five intent types. Here is a template:
| Intent | Query Template | Competitive Signal |
|---|---|---|
| Discovery | "best [category] for [use case]" | Who gets recommended first? |
| Comparison | "[your brand] vs [competitor]" | How does positioning differ? |
| Validation | "is [competitor] good for [use case]" | What sentiment does AI express? |
| Informational | "how to [solve problem competitor addresses]" | Who gets cited as an authority? |
| Review-seeking | "[competitor] reviews" | Which sources does AI pull from? |
Aim for 8 to 12 queries per competitor, distributed across intent types. Weight discovery and comparison queries more heavily, as these are where buying decisions happen.
For a deeper dive into how query intent drives AI citation behavior, see ChatGPT vs Perplexity: How Citation Behavior Differs.
🔄 STEP 3: RUN QUERIES ACROSS 4 PLATFORMS (40 SESSIONS PER QUERY)
This is the execution step, and it requires more sessions than most teams expect. AI responses are non-deterministic. You cannot run a query once and treat the result as ground truth.
Lee (2026) found that within-platform citation consistency for ChatGPT was 61.9% (Jaccard = 0.619). Roughly 38% of cited sources change between sessions for the same query. A single session gives you a noisy snapshot. Running 10 sessions per platform produces a citation frequency distribution, letting you calculate how often Competitor X appears, how stable their position is, and which of their pages get cited. Across 4 platforms, that is 40 sessions per query. For 10 queries, that is 400 total sessions.
The Multi-Platform Scraper Concept (CitationScraper)
Manual execution of 400+ sessions is impractical. The solution is a browser automation pipeline (CitationScraper) that takes a query list and platform list as input, spawns isolated browser sessions, extracts cited URLs and brand mentions, tags recommendation sentiment, and outputs a structured citation dataset per query, platform, and session.
Critical detail: Use the web UI, not the API. Lee (2026) documented significant API vs. web UI divergence. Reddit appears in 17% to 44% of web UI responses but 0% of API responses. An API-only scraper produces systematically different results from what real users see.
| Platform | Web UI Citation Behavior | API Citation Behavior | Divergence |
|---|---|---|---|
| ChatGPT | Reddit: 17%, broad source mix | Reddit: 0%, narrower sources | Significant |
| Perplexity | Reddit: 20%, inline numbered refs | Reddit: 0%, different source pool | Significant |
| Google AI Mode | Reddit: 44%, Google-grounded | N/A (no public API) | N/A |
| Claude | Reddit: 0%, conservative citing | Reddit: 0%, conservative citing | Minimal |
For understanding the full scope of platform differences, see our cross-platform citation behavior comparison.
The Bottom Line: 10 sessions per platform per query is the minimum for reliable data. Use browser automation on web UIs, not API calls, to capture what real users actually see.
📊 STEP 4: TRACK CITATION FREQUENCY, RECOMMENDATION SENTIMENT, AND POSITIONING
Once you have collected raw session data, the analysis phase extracts three dimensions of competitive intelligence.
For each competitor, across each platform, build a citation frequency table:
| Competitor | ChatGPT Citation Rate | Perplexity Citation Rate | Claude Citation Rate | Google AI Mode Citation Rate |
|---|---|---|---|---|
| Competitor A | 70% (7/10 sessions) | 40% (4/10) | 20% (2/10) | 80% (8/10) |
| Competitor B | 30% (3/10) | 60% (6/10) | 50% (5/10) | 20% (2/10) |
| Your Brand | 10% (1/10) | 30% (3/10) | 10% (1/10) | 40% (4/10) |
This table immediately reveals platform-specific strengths and weaknesses. But citation frequency alone is not enough. You also need to track recommendation sentiment (positive, neutral, or negative framing) and positioning (mention order, framing role, and feature associations).
The Bottom Line: A competitor cited 80% of the time with negative sentiment is in a worse position than one cited 40% of the time with consistently positive recommendations. Track all three dimensions.
🔍 STEP 5: ANALYZE WHY COMPETITORS GET CITED (THE 7 PREDICTORS)
This is where the audit shifts from observation to actionable diagnosis. For every competitor that outperforms you in citation frequency, you need to understand why their pages win.
Lee (2026) identified 7 page-level features that statistically predict AI citation. Use these as your diagnostic checklist when analyzing competitor pages.
The 7 Predictors Applied to Competitive Analysis
| # | Predictor | What to Check on Competitor Pages | Your Page vs. Theirs |
|---|---|---|---|
| 1 | Internal link count (r = 0.127, fewer = cited) | How robust is their navigation architecture? Menus, breadcrumbs, sidebars, footer links. | Count navigation links on their cited pages vs. yours |
| 2 | Self-referencing canonical (OR = 1.92) | Do they have a clean canonical tag pointing to the page itself? | Check your own canonical implementation |
| 3 | Schema markup presence (OR = non-significant (p=0.78) for generic presence) | What schema types do they use? Product (OR = 3.09), Review (OR = 2.24), and FAQPage (OR = 1.39) all boost citation probability. | Compare schema type and completeness |
| 4 | Word count (cited median = 1,799) | How long are their cited pages? Cited pages average 39% more words than uncited. | Measure word count gap |
| 5 | Content-to-HTML ratio (0.086 vs. 0.065) | How dense is their actual content vs. boilerplate HTML? | Run a content-to-HTML ratio check |
| 6 | Schema attribute completeness (OR = 1.21) | How many attributes does their schema include? More attributes = better. | Compare attribute counts |
| 7 | Link ratio (OR = 0.47 when external-heavy) | What is their internal-to-external link ratio? Pages with 70%+ internal links achieve a 59.7% citation rate vs. 21.4% for external-dominant. | Audit link ratios on both sides |
Aggarwal et al. (2024) found that targeted optimization strategies can boost generative engine visibility by up to 40%. (Note: this Princeton lab result has not replicated on production AI platforms in our testing; see our replication analysis.) The most effective strategies were "citing sources" and "adding statistics," both of which align with the structural features in the predictor model. If your competitors are implementing these strategies and you are not, that gap explains the citation difference.
For each competitor page that gets cited, fetch the HTML, extract these 7 values, compare against your equivalent page, and prioritize fixes by odds ratio. For a complete 47-point version, see our AI SEO audit checklist. For a detailed breakdown of each predictor, see 7 Page Features That Predict AI Citation.
The Bottom Line: When a competitor gets cited and you do not, the answer is usually in the page structure, not in the content quality. The 7 predictors give you a measurable, auditable framework for diagnosing exactly where you fall short.
🧩 STEP 6: IDENTIFY GAPS AND OPPORTUNITIES
The final step synthesizes your findings into a prioritized action plan. You now have citation frequency data, sentiment data, positioning data, and page-level diagnostic data for every competitor across every platform. Here is how to turn that into strategy.
The Reddit Shadow Corpus Effect
Before prioritizing content changes, account for the shadow corpus. Lee (2026) found that Reddit brand consensus correlates with AI recommendations at rho = 0.554 across 12 consumer categories (8/12 survived Bonferroni correction). Your competitors' AI recommendation strength may be partially driven by Reddit sentiment that is invisible in citation data, absorbed into LLM training data during pre-training.
| Reddit Signal | AI Impact | Audit Action |
|---|---|---|
| Consistent positive mentions in subreddit threads | Higher recommendation probability across all AI platforms | Monitor competitor Reddit sentiment in relevant subreddits |
| Upvote-weighted brand consensus | rho = 0.554 correlation with AI brand ranking | Compare your Reddit presence vs. competitors using upvote-weighted scoring |
| Category-specific subreddit dominance | Strongest effect in Office/Workspace (rho = 0.746) and Outdoor/Camping (rho = 0.674) | Identify which subreddits matter for your industry |
For the full breakdown of how Reddit shapes AI recommendations, see Reddit's Invisible Influence on AI Search.
Gap Analysis Framework
Organize your findings into four quadrants:
| Quadrant | Your Brand | Competitor | Priority |
|---|---|---|---|
| Citation gap | Low citation rate | High citation rate | High: structural fixes needed |
| Sentiment gap | Neutral/negative positioning | Positive recommendation | High: content and reputation work needed |
| Platform gap | Strong on one platform | Strong on a different platform | Medium: platform-specific optimization |
| Reddit gap | Low Reddit presence | Strong Reddit consensus | Medium-high: community engagement strategy |
The audit will reveal three opportunity types: (1) unclaimed queries where no competitor is consistently cited, (2) platform-specific weaknesses where a competitor dominates one platform but is absent from another, and (3) structural vulnerabilities where competitor pages score poorly on the 7 predictors despite current high citation rates.
The Bottom Line: The competitive audit is not a one-time report. AI responses shift as models update, content changes, and Reddit sentiment evolves. Build the audit as a repeatable process and run it quarterly at minimum.
📝 TEMPLATE: AGENCY AUDIT REPORT STRUCTURE
For agencies delivering AI competitive audits to clients, here is a standardized seven-section report:
| Section | Contents |
|---|---|
| 1. Executive Summary | Client AI citation share vs. top 3 competitors (one table, four platforms); biggest gap; top 3 actions |
| 2. Platform Citation Analysis | Citation frequency, sentiment breakdown, and positioning per competitor per platform |
| 3. Competitor Page Diagnostics | 7-predictor comparison table for competitor pages vs. client equivalents; schema type comparison |
| 4. Reddit Shadow Corpus Assessment | Competitor vs. client Reddit mention frequency; upvote-weighted scoring; correlation with AI recommendations |
| 5. Query Intent Mapping | Full query matrix with intent classification; queries where client has zero presence |
| 6. Prioritized Action Plan | Structural fixes ranked by odds ratio; content gaps ranked by volume; platform-specific recommendations |
| 7. Monitoring Cadence | Re-audit schedule (quarterly minimum); leading indicators (bot crawl activity, content index status) |
For a single client with 5 competitors and 10 queries, expect 400 total sessions (10 queries x 4 platforms x 10 sessions), 15 competitor page audits, and 5 to 10 subreddits to monitor. The ultimate deliverable is an AI share-of-voice metric: what percentage of AI recommendations in your category mention your brand vs. competitors.
For help implementing this audit, see our competitive intelligence service or start with a free AI visibility quick check. For a deeper exploration of share of voice as a metric, see AI Share of Voice. For our full audit service, see AI SEO Audits.
❓ FREQUENTLY ASKED QUESTIONS
How often should I run an AI competitive audit?
Quarterly at minimum. The initial audit establishes a baseline; subsequent audits track movement. Between full audits, monitor leading indicators like AI bot crawl frequency. The 61.9% within-platform consistency rate (Lee, 2026) means month-to-month variation is normal, so do not overreact to single-session changes.
Can I use the API instead of the web UI for data collection?
You can, but your data will differ from what real users see. Reddit citations are 0% via API vs. 17% to 44% via web UI (Lee, 2026). The API is acceptable for scaled tracking if you document the limitation.
How do I check what ChatGPT says about my competitors right now without building automation?
Open ChatGPT with web search enabled, type 5 to 10 category queries, and record which competitors are mentioned, in what order, and with what sentiment. Repeat on Perplexity, Claude, and Google AI Mode. This gives a directional read in under an hour. For a structured manual approach, see our guide on how to check if AI cites your website.
What if my industry is too niche for AI platforms to have strong opinions?
Niche industries are easier to audit because the competitive set is smaller. Aggarwal et al. (2024) found that optimization effectiveness varies by domain, and niche factual domains respond especially well to structural optimization. The opportunity to differentiate through page structure is often larger in niche categories.
How does the Reddit shadow corpus affect B2B competitive audits?
The rho = 0.554 correlation was measured across consumer categories (Lee, 2026). B2B categories may show a weaker effect, but many B2B verticals (SaaS, marketing tools, dev tools) have active subreddit communities. Check whether relevant subreddits exist before assuming Reddit does not apply.
📚 REFERENCES
- Aggarwal, P., Murahari, V., Rajpurohit, T., Kalyan, A., Narasimhan, K., & Deshpande, A. (2024). "GEO: Generative Engine Optimization." KDD 2024. DOI
- Lee, A. (2026). "Query Intent, Not Google Rank: What Best Predicts AI Citation Behavior." Preprint v5. DOI
- Lee, A. (2026). "Reddit Doesn't Get Cited (Through the API): Training Data Influence, Access-Channel Divergence, and the Shadow Corpus in AI Brand Recommendations." Preprint. DOI
- Sellm (2025). "ChatGPT Citation Analysis." Industry report (400K+ pages analyzed).
- Tian, Y. et al. (2025). "Diagnosing and Repairing Citation Failures in Generative Engine Optimization." Preprint.