← Back to Blog

AI SEO EXPERIMENTS

The Hidden Search Queries AI Runs Before It Answers You

2026-04-11

The Hidden Search Queries AI Runs Before It Answers You

When you ask ChatGPT a question, it does not search the web with your words.

It rewrites your question first. It strips out the conversational parts. It compresses a 10-word sentence into a 5-word keyword search. And sometimes, it adds brand names you never mentioned.

Then it searches the web with its version of your question. Not yours.

This means every piece of advice about "writing content for how people talk to AI" is solving the wrong problem. The user talks to AI in natural language. But the AI talks to the web in keywords. The web never sees the user's words. It only sees the AI's translation.

We captured 2,158 of these hidden search queries across ChatGPT, Perplexity, and Claude. We classified every one. This is what we found.

What Are Fan-Out Queries?

When an AI platform receives your question, it does not just look up the answer in its memory. If the question needs current information, the AI searches the web first. But it does not copy your question and paste it into a search engine.

Instead, it breaks your question apart and creates several smaller, more focused searches. These smaller searches are called fan-out queries. They are the actual search strings the AI sends to web search engines like Bing.

Here is a real example. A user asks:

"What should I look for when hiring an AI SEO agency?"

The AI does not search for that exact sentence. Instead, it generates searches like:

  • "AI SEO agency evaluation criteria 2026"
  • "best AI SEO agencies"
  • "generative engine optimization agency"
  • "AI search optimization services"

Each search returns different web pages. The AI reads all of them, picks the most useful parts, and combines everything into one answer. The pages that show up in these internal searches are the only pages that can be cited in the AI's response.

If your page does not match one of these fan-out queries, the AI never sees it. It does not matter how good your content is.

Why This Matters Right Now

Semrush analyzed 17 months of ChatGPT data (October 2024 to February 2026, covering more than 1 billion sessions) and found that 65-85% of user prompts do not match any keyword in their 27-billion-keyword database. The way people talk to AI is completely different from the way people type into Google.

This raised a question that nobody had answered with data: if users are not searching in keywords, what is the AI searching in?

We now have the answer. The AI is converting those conversational prompts back into keyword-style searches. The user speaks naturally. The retrieval system speaks SEO.

What We Did

We captured fan-out queries from three AI platforms using network-level interception. Each platform's scraper listens to the internal communication between the AI model and its web search tool, capturing the exact search strings before they are sent to the web.

  • ChatGPT: Captured via SSE (Server-Sent Events) interception of its web search and internal API engines
  • Perplexity: Captured via SSE and WebSocket interception across its web, memory, workflow, and shopping engines
  • Claude: Captured via SSE interception of its web search engine

We started with 2,558 raw fan-out records from 235 parent queries across 3 commercial verticals (skincare, advertising services, and marketing technology). After removing junk records, exact echoes of the parent query, and encoding artifacts, we had 2,158 clean fan-out queries to classify.

Every fan-out query was classified by its structural relationship to the parent query using a rule-based system that checks word overlap, word count ratios, entity detection, and signal-word matching. The classifier produces reliable assignments for approximately 90% of records based on manual review.

Parameter Value
Total fan-out records captured 2,558
After cleaning 2,158
Unique parent queries 235
Platforms ChatGPT, Perplexity, Claude
Verticals 3 (skincare, advertising services, marketing technology)
Data period March to April 2026

The Seven Types of Fan-Out Queries

We found seven structurally distinct types of fan-out queries. Each type represents a different thing the AI is doing with your question before it searches the web.

Type Count % What the AI is doing
Compression 818 37.9% Stripping your conversational question down to keywords
Entity Lookup 389 18.0% Adding brand or product names you never mentioned
Expansion 350 16.2% Searching for background knowledge on the topic
Reformulation 236 10.9% Rephrasing your question in different words
Tangential 215 10.0% Searching for a related topic you did not ask about
Narrowing 133 6.2% Adding specifics like a year, audience, or use case
Evidence Seeking 14 0.6% Looking for reviews, studies, or proof
Price/Availability 3 0.1% Checking prices or where to buy

Two things stand out immediately.

First, compression is the dominant behavior. Nearly 4 in 10 fan-out queries are the AI taking a conversational prompt and turning it into a keyword search. The AI is acting as a translator between natural language and traditional search.

Second, evidence seeking and price checking are almost nonexistent. Despite the fact that all of our parent queries came from commercial, purchase-intent contexts, the AI almost never goes looking for reviews (0.6%) or pricing (0.1%). It relies on what it already knows from training data for those decisions. We will come back to why this matters.

Finding 1: The AI Compresses Your Questions Into Keywords

The average parent query in our dataset was 10.3 words long. The average fan-out query was 5.6 words. That is a compression ratio of 0.54 -- the AI cuts your question nearly in half.

But the compression rate is different for each platform.

Platform n Avg Parent Words Avg Fan-Out Words Compression Ratio
Perplexity 1,473 10.5 5.0 0.48
Claude 412 10.0 6.2 0.62
ChatGPT 273 10.0 8.0 0.80
Overall 2,158 10.3 5.6 0.54

Perplexity compresses the hardest. It cuts query length in half, down to 5.0 words on average. This makes sense because Perplexity is a search-first platform. It fires many parallel searches and wants each one to be a tight keyword match.

ChatGPT compresses the least, keeping queries at 8.0 words. ChatGPT uses fewer web searches and relies more on its training data, so when it does search, it uses a more complete version of the question.

Claude falls in the middle at 6.2 words.

What this means for your content: Your pages need to match 5-8 word keyword queries, not the 10+ word conversational prompts users type into AI. If you are optimizing for "What should I look for when choosing the best vitamin C serum for dark spots on my face," the AI is actually searching for "best vitamin C serum dark spots" or "vitamin C serum hyperpigmentation." Your page needs to match the compressed version.

Finding 2: ChatGPT Decides Which Brands to Search For -- Before It Searches

This is the most commercially important finding in the study.

When you ask ChatGPT a generic question like "What is the best project management tool?", it does not neutrally search the web for all options. In 34.8% of its fan-out queries, ChatGPT injects specific brand names that the user never mentioned.

It decides, based on its training data, which brands are relevant to the category. Then it searches for those brands specifically.

Type Perplexity ChatGPT Claude
Compression 42.0% 17.9% 36.7%
Entity Lookup 17.1% 34.8% 10.2%
Expansion 17.1% 15.0% 13.8%
Reformulation 7.9% 14.3% 19.7%
Tangential 11.9% 8.1% 4.4%
Narrowing 3.1% 8.8% 15.3%
Evidence Seeking 0.7% 1.1% 0.0%

Each platform has a distinct retrieval personality:

Perplexity is a compressor. It strips queries down to keywords and fires many parallel searches. It also wanders tangentially more than other platforms (11.9%), casting a wide net across related topics.

ChatGPT is an entity injector. Over a third of its fan-outs introduce specific brand or product names the user never typed. When a user asks a generic question, ChatGPT's primary retrieval behavior is to decide which brands matter and then search for those brands by name.

Claude is a narrower and reformulator. It is the most likely to add specifics (a year, an audience segment, a use case) and to rephrase the question multiple ways before searching. This produces more targeted but less exploratory retrieval.

What entity injection means for brands: If ChatGPT does not associate your brand with your product category in its training data, it will not search for you. Your page could be the best resource on the internet for that topic, and ChatGPT will never see it -- because it never generates a fan-out query that would retrieve it.

This creates a self-reinforcing cycle. Brands with strong training-data presence get injected into fan-out queries. Their pages get fetched. They get cited. They generate more web content. That content strengthens their training-data presence. Brands outside this cycle are not competing on content quality. They are invisible to the retrieval system entirely.

This finding aligns with our prior research: 65.1% of domains in a broad sample (n=4,658) are never cited by AI platforms. The entity injection mechanism is a plausible explanation for why.

Finding 3: Inside the Engines -- Different Retrieval Systems Do Different Things

Fan-out queries do not all come from the same system inside each platform. Each platform has multiple retrieval engines, and each engine has its own behavior pattern.

Engine n Top Fan-Out Type % Second Type %
web (Perplexity primary) 982 Compression 40.2% Entity Lookup 24.2%
web_search (ChatGPT/Claude) 531 Compression 30.9% Reformulation 19.8%
memory (Perplexity) 292 Expansion 50.0% Compression 32.9%
openai_api (ChatGPT internal) 154 Entity Lookup 44.2% Compression 23.4%
perplexity_workflow 113 Compression 64.6% Reformulation 30.1%

Two findings stand out here.

Perplexity's memory engine generates 50% expansion queries. These are short contextual lookups like "marketing budget" and "business goals." This engine does not look for direct answers. It builds background context. This suggests Perplexity runs a parallel retrieval stream just to understand the topic better before answering.

ChatGPT's internal API engine generates 44.2% entity lookups. This is the engine that decides which brands to bring into a conversation. When a user asks a generic category question, this specific engine searches for brands the model already knows about. This is the mechanism behind the entity injection pattern described above.

Finding 4: The AI Almost Never Searches for Proof

This finding surprised us.

All 235 parent queries in our dataset came from commercial keyword data. They were purchase-intent questions -- the kind of question where you would expect the AI to check reviews, look up clinical studies, or verify pricing. Questions like "Is this product worth it?" and "What do experts recommend?"

Yet evidence-seeking fan-outs accounted for just 0.6% of all queries. Price and availability fan-outs accounted for 0.1%.

The AI almost never searches the web for proof.

This has a direct connection to recent research from AIVO Evidentia, who analyzed 7,000+ multi-turn buying conversations across ChatGPT, Gemini, Perplexity, and Grok. They found that a "Clinical Evidence Binary" filter is one of the strongest determinants of which brands get recommended in categories like skincare. Brands without peer-reviewed evidence get eliminated.

If the AI rarely searches for clinical evidence during retrieval (our finding: 0.6%), but strongly favors brands with clinical evidence in its recommendations (AIVO's finding), there is only one explanation: the evidence filter operates on training data, not live web search. The AI "knows" which brands have clinical backing because that information is baked into its weights from pre-training. It does not verify it in real time.

What this means for brands: Building a library of clinical studies or case studies on your website will not help if the AI never searches for them during retrieval. The evidence needs to be embedded in the content the AI does retrieve -- which, based on our data, is content that matches compressed keyword queries and entity-specific searches. If your evidence pages do not rank for those compressed keywords, the AI will never find them.

Finding 5: Different Industries Trigger Different Behavior

The three verticals in our study produced noticeably different fan-out patterns.

Metric Skincare Advertising Services Marketing Technology
Fan-outs per parent query 7.7 9.1 14.9
Top type Compression (45.0%) Compression (42.5%) Compression (33.8%)
Entity Lookup 10.3% 22.8% 17.5%
Tangential 9.5% 2.4% 17.6%
Evidence Seeking 0.6% 0.1% 1.2%

Emerging categories generate more fan-outs. The marketing technology vertical produced nearly twice as many fan-out queries per question (14.9) compared to skincare (7.7). When the AI encounters a topic where its training data is thin, it searches more. It needs more retrieval passes to build enough context for an answer.

Service categories trigger more entity injection. The advertising services vertical had the highest entity lookup rate (22.8%). This makes sense -- when someone asks about agencies or services, the AI pulls in specific platform names and competitor agencies from its training data.

Emerging categories trigger more tangential search. The marketing technology vertical had a 17.6% tangential rate -- the AI searched for related concepts like "structured data," "AI search visibility," and "generative engine optimization" that were not in the user's question but were part of the AI's conceptual map for the topic.

Finding 6: The Entities AI Platforms Inject

Across 389 entity-lookup fan-outs, we extracted every brand, product, and platform name that the AI introduced on its own -- names the user never mentioned.

Entity Count Context
Amazon 106 Platform reference in advertising queries
PPC 31 Category term added to broader queries
SEO 21 Category term
AI 20 Category term
Perplexity 19 Platform self-reference
DSP 19 Specific ad tech platform
Google 17 Platform reference
GEO 11 Category term (generative engine optimization)
Clutch 11 Agency directory platform
ChatGPT 10 Platform self-reference

This table reveals the AI's implicit category map. When asked about advertising, the AI searches for "Amazon PPC" and "DSP." When asked about AI search, it searches for "Perplexity," "ChatGPT," and "GEO." When asked about skincare, it searches for specific brand names associated with the category.

The AI is not discovering brands through neutral web search. It is confirming brands it already knows from training data.

Brands that are not in the AI's category map are structurally excluded from entity-lookup retrieval. This is the single most important insight for any brand trying to appear in AI answers: the retrieval system is not neutral. It has a pre-existing list of brands per category, and it actively searches for those brands by name.

How This Connects to Prior Research

AirOps (2025-2026)

AirOps published the largest prior fan-out dataset: 15,000 queries expanded to 43,233 with fan-outs, producing 548,534 retrieved pages and 82,108 citations. Their key finding: 32.9% of citations came exclusively from fan-out query results, and 95% of fan-out queries had zero traditional search volume.

AirOps classified fan-outs by intent pattern -- whether they stayed "near-verbatim," were "split into sub-queries," or had "year modifiers added." Our study classifies by structural type -- compression, entity injection, narrowing, evidence seeking, etc. These are complementary approaches at different levels of analysis. AirOps tells you that fan-outs matter. Our taxonomy tells you what the AI is doing when it generates them.

Seer Interactive (2026)

Seer Interactive analyzed 501 prompts through Gemini 3, finding 10.7 average fan-outs per prompt (5x ChatGPT's rate), with 26.4% including brand names and 21.3% containing a year modifier. Their brand mention finding (26.4%) is consistent with our entity-lookup rate across all platforms (18.0%), with the difference likely reflecting Gemini's specific retrieval architecture.

Semrush (2026)

Semrush's 17-month clickstream analysis established the divergence between user language and traditional search keywords (65-85% mismatch). Our study provides the mechanism: the AI's retrieval layer translates conversational prompts back into keyword-style queries at a compression ratio of 0.54. The user language is diverging, but the retrieval language is not.

What This Means for Content Strategy

1. Optimize for the compressed query, not the conversational prompt

If your target user question is "I am looking for a good vitamin C serum for dark spots on my face," the AI is likely searching for "vitamin C serum dark spots" or "best vitamin C serum hyperpigmentation." Your page needs to match the 5-6 word compressed form, not the 15-word conversational form.

This does not mean going back to keyword stuffing. It means making sure the core keyword phrase appears clearly in your headings, title, and first paragraph -- even if the rest of your content is written in natural language. The AI's retrieval system needs to find you through the compressed query. What happens after it finds you (whether it cites you) depends on content quality.

2. Get into the AI's entity map

Entity-lookup fan-outs are the primary discovery mechanism on ChatGPT (34.8% of fan-outs). If the AI does not associate your brand with the category, it will not search for you.

Getting into the entity map requires training-data presence: consistent brand mentions across authoritative sources in your category. This is not something you can do on your own website alone. It requires being mentioned, reviewed, and discussed on third-party sites that the AI has seen during training.

Our brand citation research found that 93.4% of brand-query citations come from third-party sources. Reddit alone is the #1 cited domain for brand queries across all 18 verticals we tested. The path to entity injection is through earned media and community presence, not on-site SEO.

3. Do not rely on evidence pages being found through retrieval

If you have built clinical studies, case studies, or review pages on your website, those pages will only help if the AI retrieves them. Our data shows that evidence-seeking fan-outs account for 0.6% of all retrieval queries. The AI almost never searches for proof.

Instead, embed your evidence in the content the AI does retrieve. If your product page ranks for a compressed keyword query, include your clinical data on that page -- do not put it on a separate "research" or "studies" page that no fan-out query would ever match.

4. Expect different platform behavior

The three platforms in our study have distinct retrieval personalities. Optimizing for one may not optimize for all.

  • Perplexity fires many compressed keyword searches. Match short, keyword-dense queries.
  • ChatGPT injects brands from training data. Focus on brand authority and third-party mentions.
  • Claude narrows and reformulates. Ensure your content covers specific use cases and audience segments.

5. Emerging categories require more content breadth

The AI generates nearly twice as many fan-outs for emerging topics (14.9 per query) compared to established categories (7.7). If you operate in a new or niche category, you need more content covering more angles of the topic -- because the AI will run more searches looking for context.

Limitations

This study has several limitations that should inform how you interpret the findings.

Sample size by platform is uneven. Perplexity contributes 68% of fan-outs (1,473 of 2,158), reflecting its architecture of generating more parallel searches. ChatGPT contributes only 12.6% (273). Platform-level findings for ChatGPT should be interpreted with caution due to the smaller sample.

Three verticals only. The three verticals (skincare, advertising services, marketing technology) are all relatively information-dense categories. Categories with simpler product landscapes (commodity goods, food delivery) may show different fan-out distributions.

Classification is rule-based. The taxonomy was derived using regex-based heuristics for word overlap, entity detection, and signal matching. Approximately 10% of cases are ambiguous, particularly at the boundary between compression and tangential categories.

Parent queries are generated, not organic. All parent queries were generated from keyword data using a query prediction system designed to simulate realistic AI questions. They are not captured from actual user sessions. Fan-out behavior for truly organic conversational queries may differ.

Google AI Mode is absent. Google does not expose internal search queries through the same mechanism as ChatGPT, Perplexity, and Claude. Fan-out behavior for Google's AI Mode is not represented in this data.

No temporal analysis. All data is from March to April 2026. Fan-out behavior may shift with model updates. Longitudinal tracking would strengthen these findings.

Frequently Asked Questions

What are fan-out queries?

Fan-out queries are the internal search queries that AI platforms generate when they need to search the web to answer your question. Instead of searching for your exact question, the AI breaks it into multiple smaller, more focused searches. These internal searches determine which web pages the AI sees and which ones it can cite in its response.

How many fan-out queries does each AI platform generate?

In our study of 235 parent queries, the average was 6.1 fan-outs per query across all platforms. But this varies significantly: Perplexity generates the most parallel searches per query (the majority of our 2,158 captures came from Perplexity), while ChatGPT generates fewer but more targeted searches. Seer Interactive found Gemini 3 generates 10.7 fan-outs per prompt on average.

If users are not typing keywords, why does keyword optimization still matter?

Because the AI converts conversational prompts back into keyword-style searches at the retrieval layer. The user speaks naturally, but the AI's web search tool speaks in keywords. Our data shows a compression ratio of 0.54 -- the AI cuts query length nearly in half. Your content needs to match these compressed keyword queries to be discovered by AI retrieval systems.

Can I see what fan-out queries AI platforms generate for my brand?

Yes. Fan-out queries can be captured through network interception when using AI platforms with web search enabled. Several tools exist: the ChatGPT Fan-Out Query Tool Chrome extension, LLMrefs' free extractor, and professional platforms like AirOps and Profound that capture fan-outs at scale.

What is entity injection and why does it matter for my brand?

Entity injection is when the AI adds specific brand, product, or platform names to its internal search queries that the user never mentioned. In our study, 18.0% of all fan-outs were entity lookups, rising to 34.8% for ChatGPT specifically. If your brand is not in the AI's training-data category map, it will not be injected into these searches, which means your pages will not be retrieved or cited -- regardless of how good your content is.

Does this mean traditional SEO is dead?

No. The central finding is that the AI's retrieval layer still speaks in keywords. Traditional keyword optimization is still how pages get discovered by AI platforms. What has changed is the source of the query -- it comes from the AI's compression of a conversational prompt, not from a human typing into a search box. The keywords are different (shorter, more compressed, sometimes including brand names the human never mentioned), but the mechanism is still keyword matching at the retrieval layer.

Methodology Notes

  • 2,558 raw fan-out records captured via network-level SSE and WebSocket interception across ChatGPT, Perplexity, and Claude
  • 400 records removed during cleaning (15.6%): fan-outs shorter than 5 characters, exact parent echoes, encoding artifacts
  • 2,158 clean fan-out queries classified using a rule-based classifier checking word overlap ratios, word count compression, capitalized entity detection, and signal-word matching
  • Classification accuracy estimated at approximately 90% based on manual spot-check review
  • Primary ambiguity zone: boundary between "compression" and "tangential" for fan-outs sharing some vocabulary with the parent but shifting topic
  • All parent queries generated from GSC and Bing Webmaster Tools keyword data using AI+Automation's query prediction system
  • Data period: March to April 2026
  • Verticals anonymized in this article for client confidentiality

References

  • Lee, A. (2026). "I Rank on Page 1 -- What Gets Me Cited by AI?" Preprint v1. aiXiv | Dataset
  • AirOps (2026). "The Influence of Retrieval, Fan-out, and Google SERPs on ChatGPT Citations." Report
  • Seer Interactive (2026). "Initial Research: Gemini 3 Query Fan-Outs." Insights
  • Semrush (2026). "ChatGPT traffic analysis: Insights from 17 months of clickstream data." Blog
  • AIVO Evidentia (2026). "Decision-Path Analysis across AI Recommendation Systems: An Evidence-Grade Filter Taxonomy." Working Paper WP-2026-01. Zenodo