AI SEO EXPERIMENTS
Stop Tracking AI Search Queries. Track Search Types Instead.
When ChatGPT searches the web for you, it generates internal search queries called fan-out queries. These are the actual strings sent to Bing. They determine which pages get fetched, which enter the AI's context window, and which get cited in the response.
Several companies now sell tools that capture these fan-out queries. The pitch: "See exactly what ChatGPT searches for, then optimize for those exact strings."
We tested whether that approach works. Across three separate replicate studies covering Gemini, Perplexity, and ChatGPT (both API and browser UI), we measured how stable fan-out queries are between identical runs of the same prompt.
The answer is more nuanced than we first thought. Between any two runs, 45 to 73 percent of fan-out strings are completely different depending on the platform. But over many runs, a small canonical core of about 5 strings emerges that covers 59 to 76 percent of all fan-out events.
What stays stable across runs, regardless of platform? The structural type of search. When ChatGPT decides to inject brand names on one run, it injects brand names the next run too. When Perplexity compresses to keywords, it compresses again. The specific words change, but the strategy does not.
The actionable target is the type, not any single string. Aggregating many runs to find the canonical strings is a viable secondary strategy on platforms with strong cores, but primary optimization should target the fan-out type your intent triggers.
What We Found
This comes from our research paper (Study 3, v1.3). The main corpus: 1,323 fan-out queries across ChatGPT, Gemini, and Perplexity, with 540 parent queries across 10 commercial verticals and 5 intent types. The replicate analysis expanded to three separate studies totaling thousands of additional comparisons.
The Two Layers of AI Search
AI search is not one decision. It is two.
Layer 1: Does the AI search the web at all? This decision is highly stable across all platforms. Gemini produced the same search/no-search decision 98.9% of the time across replicate runs -- the most deterministic platform we measured. ChatGPT gpt-5.4-mini was close behind at 91.7%. If the AI searched the web for "best CRM software" on run one, it searched again on run two. If it answered "how does photosynthesis work" from memory, it answered from memory every time.
This layer is driven by intent. Discovery queries ("best X for Y") trigger web search 98% of the time on ChatGPT. Informational queries ("how does X work") trigger search only 12% of the time. The AI is confident it knows how things work. It is less confident it knows what to recommend.
Layer 2: What does the AI search for? When it decides to search, the AI generates fan-out queries. Between any two runs, the specific strings differ substantially. But over many runs, a canonical core emerges -- and the structural type of search stays stable across all platforms.
This two-layer structure means there are two distinct optimization problems. Layer 1 is about training-data presence -- if the AI never searches, your web content does not exist. Layer 2 is about matching your content to the right type of search.
The Seven Types of AI Search
We classified every fan-out query into one of seven structural types based on what the AI is doing when it generates the search.
| Type | % of All Fan-Outs | What the AI Is Doing |
|---|---|---|
| Tangential | 17.2% | Searching for an adjacent topic the user did not ask about |
| Expansion | 16.0% | Looking up background context ("marketing budget," "skincare ingredients") |
| Evidence Seeking | 15.6% | Searching for reviews, studies, benchmarks, or proof |
| Entity Injection | 15.6% | Adding brand names from training data the user never mentioned |
| Compression | 14.5% | Stripping the conversational question down to keywords |
| Narrowing | 9.6% | Adding specifics: a year, a price range, an audience segment |
| Reformulation | 7.0% | Rephrasing the same question in different words |
| Price/Availability | 4.5% | Checking prices or purchase options |
No single type dominates. The distribution is more even than our earlier Study 2 found, which was dominated by compression at 37.9%. The broader query set in this study, covering 10 verticals and 5 intent types instead of 3 client verticals, produces a more diverse retrieval pattern.
Intent Determines the Type
The type of search the AI runs depends heavily on what the user is trying to do (chi-squared X2=299.6, p<0.001).
When someone is shopping (discovery intent), the AI's dominant behavior is entity injection. It decides which brands matter and searches for them by name. Entity injection rates for purchase-intent queries: 19.8%. For knowledge-intent queries: 6.0%. That is a 3.3x difference.
When someone is learning (informational intent), the AI compresses the question to keywords and searches for explanatory content. It does not inject brand names.
When someone wants opinions (review-seeking intent), the AI searches for evidence -- reviews, studies, proof.
When someone is comparing (comparison intent), the AI checks prices and availability at the highest rate (4.8% price/availability, vs near-zero for other intents).
This pattern was statistically significant (p<0.001) and is one of the strongest findings in the study.
How to Optimize for Each Type
Here is the practical part. For each fan-out type, what content strategy matches it?
Entity Injection (15.6% of fan-outs) -- This Is PR Work
When ChatGPT decides to search for brand names, it pulls those names from its training data. Not from your website. Not from your schema markup. From the web content it was trained on.
If your brand appears in enough third-party sources -- review sites, industry roundups, directory listings, press coverage, Reddit threads -- the AI associates your brand name with your product category. Then when someone asks "best [your category]," the AI injects your brand into its search queries and goes looking for pages about you.
If your brand is not in that training-data category map, the AI never searches for you. Your pages could rank #1 on Google for every relevant keyword, and ChatGPT would never see them, because it generates a fan-out query for "Bitwarden review 2026" instead of "YourBrand review 2026."
The strategy: Get mentioned on other people's sites. Review sites, comparison articles, industry lists, Reddit communities, YouTube reviews. This is reputation and PR work, not SEO. Our brand citation research found that 93.4% of brand-query AI citations come from third-party sources.
ChatGPT is the most aggressive entity injector. 32% of its fan-outs inject brands (composite across model tiers). The smaller model tiers inject even more: gpt-5.4-nano injects entities on 44.8% of fan-outs. If your customers use ChatGPT, training-data presence is your top priority.
Compression (14.5% of fan-outs) -- This Is Classic SEO
The AI takes "how do I know if my gut bacteria are out of balance and what should I do about it" and turns it into "gut microbiome imbalance symptoms." A 15-word conversational prompt becomes a 4-word keyword search.
Your existing keyword research mostly covers this. The key is to make sure you are targeting the compressed form, not the conversational phrasing. If your page title is "Everything You Need to Know About Gut Health and How to Improve It," the AI may be searching for "gut microbiome imbalance symptoms" -- and your page does not match.
The strategy: For every target topic, identify the 4-6 word compressed keyword form. Put it in your title, H1, and first paragraph. This is traditional SEO with one adjustment: you are optimizing for what the AI compresses to, not what the human types.
Compression has the highest citation yield. In our Perplexity citation data, compression fan-outs produced 148% citation yield (1.48 citations per fan-out). They map directly to keyword-ranked pages. This is where traditional SEO directly translates to AI citation.
Evidence Seeking (15.6% of fan-outs) -- This Is Content Marketing
The AI searches for "[your product] review," "[your category] case study," "[your product] vs [competitor]." It literally uses these phrases.
The strategy: Create content that matches evidence-related search terms. The content needs to actually exist, rank, and contain real evidence -- not just marketing copy with "review" in the title. A page titled "Why Our Product Is the Best" will not match an evidence-seeking fan-out. A page titled "CRM Software Comparison: 6 Platforms Tested Over 90 Days" will.
Perplexity and ChatGPT lead in evidence seeking (21% and 23% respectively). If your customers use these platforms for research before buying, evidence pages are high priority.
Evidence-seeking fan-outs have the second-highest citation yield at 128%. When the AI searches for evidence, it cites what it finds.
Narrowing (9.6% of fan-outs) -- This Is On-Page Optimization
The AI adds "2026," "for small business," "under $50," "for beginners," "enterprise." These qualifiers come from the AI's interpretation of the user's context, not from the user's actual words.
The strategy: Make sure your pages include the qualifiers AI platforms add. Year mentions in titles and headings. Audience segments ("for small business," "for enterprise"). Price tiers. Experience levels. If the AI narrows to "best CRM for small business 2026" and your page title is just "Best CRM Software," you might not match the narrowed query.
Narrowing fan-outs produce 129% citation yield, the second-highest after compression. When the AI narrows, it is getting specific, and specific queries produce direct citations.
Expansion (16.0% of fan-outs) -- You Cannot Optimize for This
Expansion fan-outs are background context lookups. The AI searches for "marketing budget" or "skincare ingredients" or "ecommerce supply chain" -- short, broad terms that build the AI's understanding of the topic before it answers.
These produce the lowest citation yield at 48%. The AI reads this content to inform its thinking, but it does not cite it. The content shapes the answer without appearing in the sources.
The strategy: There is no direct optimization for expansion fan-outs. They are the AI's homework. Your content may be read as background without ever being cited. This is useful to know because it explains why some high-traffic, high-ranking pages never appear in AI citations -- they serve as expansion context, not citable sources.
Tangential (17.2% of fan-outs) -- Cover Adjacent Topics
The AI wanders. It searches for topics related to but not directly about the user's question. Someone asks about CRM software and the AI searches for "sales pipeline management" or "customer retention strategy."
The strategy: Build content clusters that cover adjacent topics, not just your core product. If you sell CRM software, also cover sales methodology, customer success metrics, pipeline management frameworks. When the AI tangentially searches for these topics and finds your content, you enter its context window -- and you may appear as a citation for the main query even though the fan-out was about an adjacent topic.
Tangential fan-outs produce 67% citation yield. Not as high as compression or evidence, but not negligible. They are a path to citation through topical authority rather than direct keyword matching.
Price/Availability (4.5% of fan-outs) -- Almost Never Happens
Despite the fact that our entire query set was commercial in nature, the AI rarely checks prices or availability. When it does, it is most common on comparison queries (4.8% of comparison fan-outs).
The strategy: If you sell products online, make sure your pricing is visible in the HTML (not hidden behind JavaScript or "contact for pricing"). When the AI does search for pricing, it needs to find it on the page. But do not invest heavily in optimizing for price fan-outs -- they account for less than 5% of retrieval behavior.
Perplexity never generated a price fan-out in our data (0.0%). ChatGPT and Gemini accounted for all of them. If your customers compare prices on Perplexity, the AI is not checking your prices for them.
Each Platform Searches Differently
The three platforms have distinct retrieval personalities (chi-squared X2=386.9, p<0.001, Cramer's V=0.38 -- a large effect).
| Type | ChatGPT | Gemini | Perplexity |
|---|---|---|---|
| Entity Injection | 32.0% | 4.0% | 9.8% |
| Evidence Seeking | 23.1% | 5.7% | 20.5% |
| Expansion | 4.9% | 27.2% | 13.7% |
| Compression | 7.5% | 18.7% | 18.9% |
| Tangential | 12.4% | 21.3% | 16.1% |
| Narrowing | 9.5% | 7.3% | 13.1% |
| Price/Availability | 4.9% | 7.9% | 0.0% |
ChatGPT injects brands and seeks evidence. If your customers use ChatGPT, your priority is training-data presence (get mentioned on third-party sites) and evidence content (reviews, case studies, comparisons).
Gemini explores broadly. It casts a wide net with expansion and tangential searches. If your customers use Gemini, topical authority and content breadth matter more than any single page.
Perplexity compresses and seeks evidence. It is the most keyword-oriented platform. Traditional SEO (ranking for compressed keyword queries) and evidence content are both high priority for Perplexity.
The Model Tier That Changes Everything
An unplanned finding: ChatGPT's decision to search the web at all depends on which model is running.
| Model | Queries | Searched the Web | Search Rate |
|---|---|---|---|
| gpt-5.4-nano | 44 | 44 | 100% |
| gpt-5.4-mini | 46 | 46 | 100% |
| gpt-5.4 (flagship) | 90 | 26 | 29% |
The flagship model (gpt-5.4) only searched the web 29% of the time. The smaller models searched every time. The bigger the model, the more confident it is in answering from training data alone.
This means your web content may be invisible to the most capable version of ChatGPT -- not because it cannot find you, but because it never tries. It answers from memory.
The intent effect makes this worse. Informational queries trigger web search only 12% of the time on ChatGPT (across all model tiers). If someone asks the flagship model an informational question, the probability of any web search happening is close to zero.
The practical implication: For informational content targeting ChatGPT, training-data presence is more important than web ranking. For discovery and comparison content, web ranking still matters because those intents trigger search 72-98% of the time.
The Real Story on Fan-Out String Stability
When we first ran this study (v1.2), we measured fan-out consistency on ChatGPT gpt-5.4-mini through the OpenAI API and found that 98% of fan-out strings were completely different between identical runs. We published that finding and it became the central claim of the earlier version of this article.
Then we ran the study across more platforms and capture methods, and the picture got more interesting.
Cross-Platform Replicate Data (v1.3)
Our v1.3 revision includes three separate replicate analyses:
| Platform (Capture) | Mean Jaccard | % Zero Overlap | Top-5 Canonical Coverage |
|---|---|---|---|
| Gemini (API) | 0.17 | 45% | 69% |
| Perplexity (Browser) | 0.16 | 59% | 59% |
| ChatGPT UI (Browser) | 0.21 | 73% | 76% |
| gpt-5.4-mini (API) | 0.012 | 98% | (outlier) |
Three things became clear.
First, the 98% figure was an outlier, not a general property. It was specific to gpt-5.4-mini captured through the OpenAI API. Across Gemini, Perplexity, and ChatGPT's browser interface, zero-overlap rates are much lower (45-73%).
Second, there is a canonical core. Over many runs, a small set of about 5 fan-out strings recurs consistently. On ChatGPT's browser UI, these top-5 canonical strings cover 76% of all fan-out events. On Gemini, 69%. On Perplexity, 59%. The strings that appear between any two specific runs are largely different, but the underlying "deck" the AI shuffles from is small.
Third, the structural type stays stable across all platforms. Even when the strings differ, the type of search (compression, entity injection, evidence seeking, etc.) matches 55-65% of the time depending on platform.
What This Means for Optimization
A GEO tool that captures fan-out queries in a single session gives you one random sample from a distribution. On ChatGPT browser, that sample likely includes some canonical strings and some one-off variations. On Perplexity, it is closer to a random draw.
The optimization target shifts based on platform:
- ChatGPT UI (76% canonical coverage): Aggregating fan-outs across many sessions reveals a stable core of roughly 5 strings that dominate. Both type-level AND string-level targeting are viable -- but you need many samples, not one.
- Gemini (69% canonical coverage): Moderate canonical core. Type-level targeting is primary, aggregated string targeting is a viable secondary strategy.
- Perplexity (59% canonical coverage): Weaker canonical core. Type-level targeting dominates. String-level aggregation is less reliable here.
In all cases, type-level optimization is the primary strategy. Aggregated canonical-string targeting is a secondary strategy that only works on platforms with strong cores and only when you have enough samples to see the pattern.
Why Different Strings Still Produce Consistent Citations
Here is the most practically important finding in the replicate analysis. Even when fan-out strings are completely different between two runs, the cited domains tend to be the same.
We looked at 81 query pairs on ChatGPT where the fan-out strings had ZERO overlap between runs. In 78% of those pairs, at least one cited domain was shared. Mean citation domain Jaccard for zero-overlap pairs: 0.419.
Example (Trello vs Monday.com query):
- Run 1 fan-outs: "Trello pricing features official site 2026" and "Monday.com pricing features official site 2026"
- Run 2 fan-outs: "Trello pricing features official site monday.com pricing features official site"
- Completely different strings. Both cite trello.com and monday.com. Citation overlap: 100%.
Why? Different keyword-style queries within the same topic hit the same search index and surface the same authoritative pages. The top-ranking domains for "Trello pricing" and "Trello pricing features 2026" are nearly identical because Google's ranking is stable for category-dominant sites.
This is the strongest argument for building domain authority in your category. You do not need to rank for any specific fan-out string the AI generates. You need to be the authoritative domain in the category that ranks for ANY reasonable query in that category. The stochastic fan-out layer washes out at the domain level because dominant domains absorb all the variation.
What This Means for Your Strategy
Step 1: Identify which intent types your customers use
Are they shopping (discovery)? Learning (informational)? Comparing? Seeking reviews? Each intent triggers a different fan-out type distribution.
- Discovery triggers entity injection. Invest in third-party brand presence.
- Informational triggers compression. Invest in keyword-optimized educational content.
- Review-seeking triggers evidence searching. Invest in reviews, case studies, and comparison content.
- Comparison triggers price checking and narrowing. Invest in comparison tables and pricing transparency.
Step 2: Match your content to the fan-out types
Do not try to optimize for a single captured fan-out string. That string is one sample from a distribution. But do optimize for the fan-out type your intent triggers:
- For entity injection: Brand mentions across third-party sites (PR, not SEO)
- For compression: Pages that rank for 4-6 word keyword queries (SEO)
- For evidence seeking: Review pages, case studies, comparison tests (content marketing)
- For narrowing: Pages with year, audience, and price qualifiers in titles and headings (on-page optimization)
Step 3: Build domain authority in your category
This is the most reliable strategy because it survives string-level variation. When the AI generates different fan-out strings between sessions, both versions hit the same search index. Domains that rank across the surface area of a category absorb all the variation. You do not need to rank for any specific string -- you need to be the authoritative domain that ranks for all of them.
Step 4 (Advanced): Aggregate fan-out samples on high-core platforms
On ChatGPT browser UI and Gemini, the top 5 canonical strings cover 69-76% of all fan-out events for a given query. If you have the ability to aggregate fan-outs across many sessions, you can identify this canonical core and target it directly as a secondary optimization. This requires capturing dozens of sessions per query to see through the noise.
On Perplexity, the canonical core is weaker (59%) and this approach is less reliable.
Step 5: Know which platforms your customers use
ChatGPT is an entity injector. Gemini is an explorer. Perplexity is an evidence seeker. Optimizing for ChatGPT (brand presence + evidence) is a different strategy than optimizing for Perplexity (keyword ranking + evidence) or Gemini (topical breadth).
The short version:
| Fan-Out Type | Strategy | Discipline |
|---|---|---|
| Entity Injection | Get mentioned on third-party sites | PR |
| Compression | Rank for 4-6 word keyword queries | SEO |
| Evidence Seeking | Create reviews, case studies, comparisons | Content Marketing |
| Narrowing | Add year, audience, price qualifiers to pages | On-Page SEO |
| Tangential | Build content clusters on adjacent topics | Content Strategy |
| Expansion | Cannot directly optimize (background context) | N/A |
| Price/Availability | Make pricing visible in HTML | Product Page UX |
Limitations
Replicate data comes from three separate studies. The 98.9% Gemini trigger agreement and the Mean Jaccard figures come from three distinct analyses with different scopes: Gemini API controlled replicates (3 x 180 queries), ChatGPT gpt-5.4-mini API controlled replicates (3 x 180 queries), and production browser-captured data for Perplexity and ChatGPT UI (n=1,665 non-echo fan-outs). Different data sources, different scopes, but consistent directional findings.
The 98% zero overlap figure is an outlier. It applied to gpt-5.4-mini through the OpenAI API only. Cross-platform data shows 45-73% zero overlap with canonical cores covering 59-76% of events. The original claim has been deprecated in favor of platform-specific numbers.
Parent-echo scraper artifact. 44.4% of ChatGPT browser "fan-outs" captured by automated scrapers are literal echoes of the parent query -- a scraper-side artifact, not real decomposition. All consistency numbers in this article exclude these echoes. If a GEO tool shows you fan-out queries that look identical to the parent query, that is the same artifact.
Rule-based classification at ~88% accuracy. Approximately 12% of fan-outs may be misclassified, primarily at the compression/tangential boundary.
Mixed ChatGPT model tiers. The ChatGPT fan-out distribution is a composite of three model tiers (gpt-5.4, gpt-5.4-mini, gpt-5.4-nano). Per-model breakdowns show entity injection ranges from 17% to 45% depending on model size.
Citation yield is Perplexity-only. The citation-to-fan-out linkage data comes exclusively from Perplexity. ChatGPT and Gemini yield rates may differ.
Single time period. All data from April 2026. Platform behavior changes with model updates.
Frequently Asked Questions
What is a fan-out query?
When you ask an AI platform a question, it does not search the web with your exact words. It generates multiple internal search queries -- called fan-out queries -- that are the actual strings sent to web search engines. These determine which pages get fetched and can be cited in the response.
Why can I not just optimize for the fan-out strings tools capture?
Because a single captured sample is one draw from a distribution. Between any two runs, 45-73% of fan-out strings are completely different (Gemini 45%, Perplexity 59%, ChatGPT UI 73%). The AI generates variations each time. A GEO tool showing you "the 4 queries ChatGPT searched" captured one sample -- by the next session, most of those strings will be different. However, over many runs, a small canonical core of roughly 5 strings dominates, covering 59-76% of all fan-out events depending on platform. Aggregating many sessions can reveal this core.
What should I optimize for instead?
The structural type of search, not any specific string. When the AI decides to inject brand names, it injects different brands each time but it is still doing entity injection. When it compresses to keywords, it compresses to different keywords but it is still compressing. Match your content strategy to the type: PR for entity injection, SEO for compression, content marketing for evidence seeking, on-page optimization for narrowing. And build domain authority in your category -- different fan-out strings still hit the same search index, so dominant domains absorb all the string-level variation.
Does ChatGPT always search the web?
No. The flagship model (gpt-5.4) only searches 29% of the time. Smaller models (gpt-5.4-mini, gpt-5.4-nano) search on every query. Informational questions trigger search only 12% of the time. If your content targets informational queries on the flagship model, the AI may never search for it at all.
Do different AI platforms search differently?
Yes. ChatGPT injects brands from training data (32% of fan-outs). Gemini explores broadly with expansion and tangential searches (48% combined). Perplexity compresses to keywords and seeks evidence (39% combined). Each platform needs a different optimization approach. Gemini is also the most deterministic platform at the search decision level (98.9% trigger agreement across replicates).
How do I know if a fan-out tool is showing me real data?
Check whether the fan-outs look suspiciously similar to the parent query you submitted. On ChatGPT browser captures, 44.4% of fan-outs from automated scrapers are literal echoes of the parent query -- scraper artifacts, not real decomposition. Real fan-outs should be transformations of your query (shorter, more keyword-like, with added entities or qualifiers), not copies of it.
Methodology
Main corpus:
- 540 parent queries (180 queries x 3 platforms: ChatGPT, Gemini, Perplexity)
- 180 queries: 10 verticals x 5 intent types x 3 queries per cell, plus 30 situation-first reformulations
- 1,323 classified fan-out queries after cleaning
- ChatGPT captured via OpenAI Responses API (3 model tiers: gpt-5.4, gpt-5.4-mini, gpt-5.4-nano)
- Gemini captured via Google GenAI API with Google Search grounding
- Perplexity captured via browser-level SSE interception (Pro account)
Replicate analysis (three separate studies):
- Analysis A: ChatGPT gpt-5.4-mini via OpenAI Responses API, 3 replicates x 180 queries
- Analysis B: Gemini via Google GenAI API, 3 replicates x 180 queries
- Analysis C: Production VPS browser-captured data, n=1,665 non-echo fan-outs across 126 multi-run groups (parent-echoes excluded as scraper artifacts)
Classification: rule-based taxonomy at ~88% accuracy (9 types, priority-ordered)
Statistical tests: chi-squared with Cramer's V, Kruskal-Wallis, Mann-Whitney U, log-linear models
References
- Lee, A. (2026). "How AI Platforms Search: Fan-Out Query Behavior Across Intent Types, Verticals, and Platforms" v1.3. Research Page | DOI
- Lee, A. (2026). "The Hidden Search Queries AI Runs Before It Answers You." Blog
- Lee, A. (2026). "I Rank on Page 1 -- What Gets Me Cited by AI?" aiXiv
- Semrush (2026). "ChatGPT traffic analysis: Insights from 17 months of clickstream data." Blog
- AIVO Evidentia (2026). "Decision-Path Analysis across AI Recommendation Systems." Zenodo