How AI Platforms Search: Fan-Out Query Behavior Across Intent Types, Verticals, and Platforms

Lee, Anthony

doi:10.5281/zenodo.19554329

How AI Platforms Search

Fan-Out Query Behavior Across Intent Types, Verticals, and Platforms

Author

Anthony Lee — AI+Automation

Status

Preprint v1.3 — April 2026 | Not yet peer-reviewed

Download PDF

Key Findings

Two-Layer Retrieval Model

AI search is two decisions, not one. Layer 1 (search or not) is highly deterministic (Gemini 98.9%, ChatGPT gpt-5.4-mini 91.7%). Layer 2 (what to search) produces peaked distributions with canonical cores: top-5 strings cover 59-76% of fan-out events across platforms. Fan-out types stay stable 55-65% across runs. Optimize for types primarily, canonical cores as secondary strategy.

Intent Drives Fan-Out Composition

Discovery queries trigger 3.3x the entity injection rate of informational queries (X²=299.6, p<0.001, V=0.24). When users shop, AI injects brand names from training data. When users learn, AI compresses to keywords. Intent, not vertical, determines retrieval strategy.

Platform Retrieval Personalities

ChatGPT injects entities (32% of fan-outs). Gemini explores broadly (27% expansion + 21% tangential). Perplexity seeks evidence (21%) and compresses to keywords (19%). Each platform needs a different optimization approach (X²=386.9, p<0.001, V=0.38).

Model-Tier Search Behavior

ChatGPT's flagship model (gpt-5.4) searches the web on only 29% of queries. Smaller models (gpt-5.4-mini, gpt-5.4-nano) search 100%. Bigger models answer from memory. Content not in training data may be invisible to the most capable model.

Format Sensitivity

Situation-first query phrasing ("I just got a data breach notification...") produces significantly different fan-out distributions than standard phrasing ("best password manager 2026") with V=0.35, p<0.001. Current monitoring tools that only track keyword-style queries miss the retrieval paths conversational prompts trigger.

Read Full Abstract

When users submit queries to AI search platforms, the platforms do not pass the user's text to web search verbatim. They decompose each prompt into multiple internal "fan-out queries" -- the actual strings sent to retrieval engines. These fan-out queries determine which pages get fetched, which enter the AI's context window, and which get cited in the response. This study classifies 1,323 fan-out queries generated by 540 parent queries across three AI platforms (ChatGPT, Gemini, Perplexity), ten commercial verticals, and five intent types. Nine findings emerge. First, user intent is a significant predictor of fan-out composition (X²=299.6, p<0.001, V=0.24): discovery queries trigger 3.3x the entity injection rate of informational queries. Second, platforms exhibit distinct retrieval personalities -- ChatGPT injects entities from training data on 32% of fan-outs, Gemini casts a wide net with 27% expansion queries, and Perplexity leads in evidence-seeking at 21%. Third, ChatGPT's search trigger rate varies dramatically by model tier: gpt-5.4 searches on only 29% of queries while gpt-5.4-nano searches on 100%. Fourth, platform-intent interaction effects explain fan-out variation better than either factor alone (two-way AIC=192 vs main-effects AIC=937). Fifth, situation-first query phrasing produces significantly different fan-out distributions than standard phrasing (V=0.35, p<0.001). Sixth, no significant vertical effect was detected at this sample size (H=6.26, p=0.71). Seventh, a three-part replicate analysis establishes that the search trigger decision is highly deterministic across platforms (Gemini 98.9%, ChatGPT gpt-5.4-mini 91.7%), while fan-out string consistency varies by platform and capture method: gpt-5.4-mini via the OpenAI API shows near-total stochasticity (Mean Jaccard 0.012) as an extreme outlier, while Gemini (0.167), Perplexity (0.157), and ChatGPT UI (0.213) all reveal meaningful canonical cores with top-5 coverage ranging from 59% to 76%. Fan-out types remain moderately stable across all regimes (55-65% top-type agreement), supporting type-level rather than string-level optimisation. These findings establish that AI search operates a two-layer retrieval system: a model-confidence layer that decides whether to search at all, and a query-decomposition layer that determines what to search for.

Keywords

Generative Engine Optimization, GEO, query fan-out, Large Language Models, search retrieval, agentic search, ChatGPT, Perplexity, Gemini, Google AI Mode

Citation

Lee, A. (2026). How AI platforms search: Fan-out query behavior across intent types, verticals, and platforms. Preprint v1.3, AI+Automation.

DOI: https://doi.org/10.5281/zenodo.19554329

ORCID: 0009-0002-4815-6373 aiXiv: aixiv.260413.000006 Zenodo: Replication Data