AI SEO EXPERIMENTS

AI Citation Experiment: How Word Choice Changes Which Sources AI Cites (1,620 Tests)

By Anthony Lee Published 2026-04-12

Ask ChatGPT "What is the best mattress?"

Now ask it "I do not trust mattress review sites. What do real people actually sleep on?"

Both questions are about the same thing. But the sources in the answer are completely different. The first question pulls from review sites like Wirecutter and Sleep Foundation. The second pulls from Reddit threads and user forums.

The question did not change. The words did. And the words changed everything.

We tested this across 1,620 AI scrapes, 500 queries, 20 product categories, and 3 AI platforms. We changed one thing at a time: how trusting the question sounded, how emotional it was, how specific, whether it named a community, and whether the user said something skeptical in a follow-up message.

The result: the words you choose when asking AI a question act as a routing signal. They determine which category of website appears in the answer. Not just which specific pages -- which entire type of source. Review sites vs Reddit. Brand websites vs forums. Mainstream media vs niche experts.

This means AI citation is not a fixed property of your website. It is a moving target that changes based on how users talk. And nobody has measured this before.

What We Tested

We designed 500 single-turn queries and 40 multi-turn conversations across 20 product categories: CRM software, VPN, mattress, investing app, online therapy, running shoes, meal kit delivery, laptop, dog food, web hosting, tax software, noise cancelling headphones, ecommerce platform, electric vehicle, vitamin C serum, accounting software, video doorbell, language learning app, car insurance, and robot vacuum.

Each query was run on ChatGPT, Perplexity, and Google AI Mode. That produced 1,500 single-turn scrapes and 120 multi-turn scrapes -- 1,620 total. We captured every citation URL and classified each one as Reddit/UGC, review/media, academic, or brand/first-party.

We tested six dimensions of how people phrase questions:

What We Changed	What We Wanted to Know
Expertise signaling (5 levels: naive to domain expert)	Does claimed expertise change source types?
Emotional state (5 levels: neutral to urgent)	Do emotional queries pull different sources?
Distrust depth (5 levels: trusting to adversarial)	Does skepticism shift citations to Reddit?
Specificity (5 levels: vague to hyper-specific)	Does precision change source diversity?
Social proof language (5 levels: product-focused to tribe expert)	Does naming a community trigger that community's content?
Multi-turn follow-up (4 types: negative, affirmation, price, challenge)	Does a follow-up sentence change which sources appear?

Each dimension had 5 levels. Each level was tested across 20 topics and 3 platforms, giving us 60 data points per level. For each level, we measured average citation count, Reddit/UGC rate, review/media rate, academic rate, domain diversity, and source overlap between levels.

The Headline Finding: Distrust Rewires Everything

Of the six dimensions we tested, one produced a result so clean that the statistical test returned a perfect score.

Distrust depth -- how skeptical the question sounds -- produced a perfectly monotonic increase in Reddit citations. Every step up in expressed distrust produced a corresponding increase in Reddit/UGC sources. No noise. No plateaus. A straight line from trust to distrust, with Reddit going from 8% to 42.5%.

Level	Example Framing	Avg Citations	Reddit %	Review %	Unique Domains
L1 Trusting	"What is the best mattress?"	16.0	8.0%	12.6%	546
L2 Cautious	"I want to make a smart choice..."	20.6	15.9%	10.6%	683
L3 Skeptical	"I do not trust most review sites..."	22.2	27.8%	11.2%	610
L4 Deep Skeptic	"Every mattress review feels paid for..."	19.1	41.8%	6.8%	440
L5 Adversarial	"This entire industry feels like a scam..."	11.3	42.5%	7.4%	239

Statistical tests:

Kruskal-Wallis: H=33.87, p=0.000001 (highly significant -- citation counts differ across levels)
Spearman trend: rho=+1.000, p<0.0001 (perfect monotonic increase in Reddit rate)

The Spearman correlation of +1.000 means there is a perfect rank-order relationship between how distrustful the question sounds and how much Reddit appears in the answer. This is the single most statistically robust finding in the entire experiment.

The Distrust Paradox

Something unexpected happens at the extremes. Moderate skepticism (L3) makes the AI work harder. It produces the MOST citations (22.2 average) and pulls from a wide range of sources (610 unique domains). The AI seems to respond to reasonable doubt by gathering more evidence.

But extreme distrust (L5) makes the AI retreat. It produces the FEWEST citations (11.3) from the NARROWEST source pool (239 unique domains). Almost half of those citations are from Reddit.

The AI appears to interpret adversarial language as a signal that most sources will be rejected. So it narrows preemptively to sources that signal independence from commercial interests -- Reddit threads, user forums, community discussions. It gives up trying to convince you with professional sources and instead shows you what real people said.

What this means for brands: If your customers are skeptical people (privacy products, finance, health supplements, insurance), the AI answers they see will be dominated by Reddit. Your Reddit presence matters more than your review site coverage. If your customers are trusting (mainstream consumer goods), they will see review sites and brand content. Get reviewed by Wirecutter and TechRadar.

Emotional Language Doubles Reddit Citations

When questions sound like they come from a real person with a real problem, AI shifts toward community-generated content.

Level	Example Framing	Reddit %	Review %
L1 Neutral	"What is the best VPN?"	7.6%	16.0%
L2 Mild Frustration	"I am tired of slow VPNs..."	11.2%	12.1%
L3 Emotional	"I am really worried about my privacy..."	17.3%	9.3%
L4 Urgent	"I desperately need a VPN that works..."	17.7%	10.5%
L5 Authority-Backed	"My IT department says I need a VPN..."	14.5%	8.7%

The effect plateaus at L3/L4. Once you express genuine frustration, adding more intensity does not change sources further. But there is an interesting exception: authority-backed emotion (L5, "my doctor said..." or "my IT department says...") partially pulls back toward professional sources. Reddit drops from 17.7% to 14.5%.

The AI pattern-matches emotional queries with the type of content that addresses emotional needs: peer experiences rather than professional reviews. When someone sounds worried, the AI gives them what other worried people found helpful.

This was not statistically significant (Kruskal-Wallis p=0.650, Spearman rho=+0.700, p=0.188) -- the effect is real but noisy across our 20 topics. Distrust is the clean signal. Emotion is a noisy amplifier.

Naming a Community Triggers That Community's Content

This is the most direct routing effect in the data. When you mention a specific community by name in your question, the AI pulls from that community.

Level	Example Framing	Reddit %	Unique Domains
L1 Product-focused	"What is the best vitamin C serum?"	6.7%	586
L2 General people	"What do most people end up buying?"	14.5%	693
L3 Community	"What does r/SkincareAddiction recommend?"	45.5%	363
L4 Niche group	"What do dermatologists personally use?"	21.0%	740
L5 Tribe expert	"What would a cosmetic chemist recommend?"	18.9%	663

The L3 result (45.5% Reddit) is the highest single-level Reddit rate in the entire experiment. And it is essentially a keyword trigger: the queries at L3 literally name subreddits. "What mattress does the r/Mattress community actually recommend to each other?" "What VPN does the r/privacy and Hacker News community actually recommend?"

The jump from L2 to L3 is not gradual. It is a cliff. Saying "what do most people recommend?" produces 14.5% Reddit. Saying "what does r/SkincareAddiction recommend?" produces 45.5%. The difference: naming the community versus implying one.

But the effect is non-linear. When you reference a professional group instead of a community (L4: "what do dermatologists personally use?"), the AI shifts away from Reddit (21%) and toward more diverse sources (740 unique domains -- the highest of any level in this variable). It gives you professional content when you ask for professional opinions.

The community naming effect is a content strategy. If your customers ask questions that reference specific subreddits, your brand needs to be positively discussed in those subreddits. If they ask questions that reference professional groups, your brand needs professional endorsements. The social reference group in the query literally selects the source community.

Specificity Opens the Long Tail

Query specificity has the highest source change rate of any variable (91% of sources differ between L1 and L5). But it does not change which categories of sources appear. It changes which specific sites within those categories you see.

Level	Reddit %	Review %	Unique Domains
L1 Vague	7.2%	14.0%	554
L2 General	6.0%	11.9%	609
L3 Category	8.7%	12.6%	687
L4 Detailed	7.4%	8.4%	703
L5 Hyper-Specific	6.9%	6.3%	772

Reddit stays flat at around 7% regardless of specificity. Review sites decrease from 14% to 6.3%. But domain diversity increases from 554 to 772. More specific queries reach deeper into the long tail of niche, specialist websites that only appear for detailed technical questions.

What this means: Vague queries surface the same mainstream sites everyone sees. Specific queries surface niche experts. If you publish deep, technical content on your site, you can capture the hyper-specific queries that mainstream competitors do not reach. This is the long tail of AI search.

Expertise Signaling Changes Sites, Not Categories

Claiming expertise in your question changes which specific sites appear (88% source change from naive to expert), but does not dramatically shift source categories.

Level	Reddit %	Review %	Academic %	Unique Domains
L1 Naive	6.6%	11.0%	0.2%	654
L3 Informed	10.6%	5.3%	0.6%	616
L4 Practitioner	6.8%	8.8%	0.3%	791
L5 Domain Expert	12.8%	5.5%	1.3%	649

The most interesting finding: practitioner-level queries (L4) produce the most diverse sources (791 unique domains). The AI treats practitioners differently -- it surfaces more niche, specialist content. But expert-level queries (L5) do not expand diversity further. They actually narrow it slightly while increasing academic citations from 0.2% to 1.3%.

Review site citations drop from 11% (naive) to 5.5% (expert). The AI assumes experts do not need review site recommendations. It gives them different information, but from the same general categories.

One Follow-Up Sentence Changes 68-86% of Cited Sources

This may be the most commercially significant finding in the study.

We ran 40 multi-turn conversations. First, the AI answered a base question. Then we sent a single follow-up sentence. We measured how many sources changed between the first and second response.

Base query example: "What is the best mattress for someone looking to buy in 2026?"

Then we sent one of four follow-ups:

Negative injection: "I have heard really bad things about Purple on Reddit. A lot of people seem to hate it. Should I avoid it?"

Affirmation: "Purple looks like a solid choice based on what you said. Can you tell me more about why it is good and who it is best for?"

Price objection: "Those all seem expensive. What are the best budget alternatives that are still genuinely good quality?"

Source challenge: "Where are you getting this information? How do I know these are not just the most marketed brands rather than the genuinely best ones?"

Here is what happened:

Follow-up Type	T1 Citations	T2 Citations	Source Overlap	T2 Reddit %	T2 Brand %
Negative injection	15.8	21.8	32.2%	33.8%	1.8%
Affirmation	14.7	21.1	32.1%	1.9%	9.8%
Price objection	17.2	23.6	30.6%	4.8%	1.1%
Source challenge	15.8	21.6	34.8%	7.0%	1.5%

Source overlap between the first and second response was only 30-35%. That means 65-70% of sources changed after a single sentence.

The platform breakdown was even more dramatic:

Platform	Source Overlap	Source Change
ChatGPT	18.4%	82% of sources changed
Perplexity	13.8%	86% of sources changed
Google AI Mode	65.1%	35% of sources changed

On ChatGPT, 82% of cited sources changed after one follow-up. On Perplexity, 86% changed. Google AI Mode was more resistant (only 35% changed), likely because it partially re-searches rather than deeply contextualizing the follow-up.

What Each Follow-Up Type Does

Mentioning Reddit + negative sentiment = Reddit flood. When the follow-up mentions Reddit as a source of negative opinions, 33.8% of the second response's citations come from Reddit. The AI hears "Reddit says bad things" and responds by pulling in Reddit content -- presumably to address the user's concern with the community's actual opinions.

Affirming a brand = brand content. When the follow-up says "tell me more about that brand," the brand's own website accounts for 9.8% of citations in the second response. That is 5x higher than any other follow-up type. The AI interprets affirmation as permission to go deeper on the brand's own content.

Challenging the sources = the AI defends itself. When the follow-up questions where the information is coming from, the AI adds a few independent sources but largely keeps its original selection (highest source overlap at 34.8%). It does not abandon its recommendations -- it adds supporting evidence.

Price objection = new products, same source types. The AI surfaces different product tiers but from similar categories of sources. Reddit stays low (4.8%), brands stay low (1.1%). It just shows cheaper options.

All Six Variables Compared

Variable	Source Change L1 to L5	Strongest Effect	Statistically Significant?
Distrust	90%	Reddit 8% to 42.5%, citations collapse at extreme	Yes (p=0.000001)
Specificity	91%	Domain diversity 554 to 772, categories unchanged	No (p=0.739)
Social proof	89%	Community naming triggers 45.5% Reddit spike	No (p=0.213)
Expertise	88%	Review sites drop 11% to 5.5%	No (p=0.262)
Emotional	82%	Reddit doubles 7.6% to 17.7%	No (p=0.650)
Multi-turn	68-86%	Negative injection triggers 33.8% Reddit; Affirmation triggers 9.8% brand	N/A

Every variable produces 80-91% source change between extremes. But only distrust shows a statistically significant effect on citation count with a perfectly monotonic trend. The others change WHICH sources appear without significantly changing HOW MANY.

What Nobody Else Has Measured

Prior research has established that AI answers vary between sessions. SparkToro and Gumshoe tested 2,961 prompts and found that ChatGPT returns the same brand list less than 1% of the time. But they tested whether identical prompts produce consistent outputs -- not how deliberate word changes route citations to different source types.

Our own earlier work (Experiment 5, n=300 responses) found 85% source change between different prompt framings. This study expands that finding with 1,620 scrapes, 6 controlled variables, 5 levels each, and statistical testing that identifies distrust as the dominant routing signal (rho=1.000, p<0.0001).

No published research has measured:

Distrust depth as a citation routing variable
The community naming effect on source selection
Multi-turn steering (how follow-up sentences redirect citations)
Cross-variable comparison with statistical significance testing

What This Means for Your AI Strategy

1. Know how your customers talk

If your audience is skeptical -- privacy products, finance, health supplements, insurance -- the AI answers they see will be dominated by Reddit. Your Reddit presence matters more than your review site coverage.

If your audience is trusting -- mainstream consumer goods, popular brands -- they will see review sites and brand content. Get reviewed by the sites AI cites most: Wirecutter, TechRadar, PCMag, NerdWallet.

2. Be present where your customers point

The community naming effect is not abstract. When a user asks "what does r/privacy recommend for VPNs?", the AI literally pulls from r/privacy. If your brand is positively discussed in that subreddit, you appear. If it is not, you do not.

This means you need to know which communities your customers reference by name, and be genuinely present in those communities. Not through astroturfing or paid posts. Through real participation, real value, real opinions from real users.

3. Specificity is your long-tail advantage

Vague queries surface mainstream sites that everyone sees. Hyper-specific queries reach 40% more unique domains (554 vs 772). If you publish deep, technical content that answers very specific questions, you can capture queries where mainstream competitors have no presence.

4. Follow-up conversations are not follow-up -- they are complete resets

On ChatGPT, 82% of sources change after one follow-up. On Perplexity, 86%. A user who starts with "what is the best mattress?" and then says "I heard bad things about Purple on Reddit" will see an almost entirely different set of sources in the second response.

This means AI citation is not a one-time optimization. The user's journey through the conversation creates a dynamic filter that continuously reshapes which sources appear.

5. Brand affirmation is a 5x multiplier

When a user says something positive about a brand in a follow-up ("that looks like a solid choice, tell me more"), the brand's own website appears in 9.8% of citations. That is 5x higher than the baseline. The AI interprets user approval as permission to go deeper on the brand's own content.

The implication: brands that generate positive first impressions in AI responses create a positive feedback loop. The user affirms, the AI goes deeper on the brand, the brand gets more exposure.

Limitations

20 topics across diverse verticals. Each level has 60 data points (20 topics x 3 platforms). Effects may be stronger or weaker in specific verticals.

Platform-specific trends are underpowered. With only 20 data points per platform per level, individual platform trends do not reach significance. The aggregate findings across all three platforms are more reliable.

Regex-based citation classification. Domain classification into UGC, review, and academic categories is heuristic. Some domains may be miscategorized.

Free Perplexity tier. Results may differ on the Pro tier.

Single collection period. April 10-12, 2026. Platform behavior may change with model updates.

Multi-turn effect depends on platform architecture. ChatGPT and Perplexity maintain strong conversational context. Google AI Mode is weaker. Future platform updates could change how responsive AI is to conversational steering.

Frequently Asked Questions

Does the way I phrase a question to ChatGPT change which websites it cites?

Yes. Across 1,620 scrapes, we found that changing the phrasing of the same underlying question changes 80-91% of the cited sources. The strongest effect is distrust language: expressing skepticism increases Reddit citations from 8% to 42.5% while halving review site citations and collapsing source diversity.

What is the community naming effect?

When a query explicitly names a community -- like "What does the r/Mattress subreddit recommend?" -- the AI pulls heavily from that community. In our data, naming a subreddit triggered a 45.5% Reddit citation rate, compared to 6.7% for product-focused queries that did not name any community. The effect is binary: name the community, get that community's content.

Does a follow-up message change AI citations?

Dramatically. On ChatGPT, 82% of cited sources change after a single follow-up sentence. On Perplexity, 86% change. The type of follow-up determines the direction: mentioning Reddit and negative sentiment floods the response with Reddit (33.8%), while affirming a brand increases that brand's own website citations 5x (to 9.8%).

Is this the same as prompt engineering?

No. Prompt engineering is about getting better outputs from AI by structuring your prompts carefully. This research measures something different: how natural variation in user language -- the kind that happens when real people ask real questions -- routes AI citations to entirely different source categories. This is not about optimizing prompts. It is about understanding how your customers naturally talk and what sources that language triggers.

Does Google AI Mode respond to word choice the same way as ChatGPT?

Google AI Mode is more resistant to conversational steering. In our multi-turn tests, only 35% of sources changed after a follow-up on Google AI Mode, compared to 82-86% on ChatGPT and Perplexity. Google AI Mode partially re-searches rather than deeply contextualizing the follow-up, making it harder to steer through conversation.

Methodology Notes

500 single-turn queries (20 topics x 5 variables x 5 levels) + 40 multi-turn conversations (10 topics x 4 follow-up types)
1,500 single-turn scrapes + 120 multi-turn scrapes = 1,620 total across ChatGPT, Perplexity, and Google AI Mode
20 product categories spanning SaaS, technology, health, finance, home, fitness, food, pet, education, automotive, and electronics
Collection period: April 10-12, 2026
Success rate: 89.5% with citations
Citation classification: regex-based domain matching for Reddit/UGC, review/media, academic, and brand/first-party categories
Statistical tests: Kruskal-Wallis for citation count differences across levels, Spearman rank correlation for monotonic trends in Reddit rate
Cross-level source overlap measured via Jaccard similarity index

References

Lee, A. (2026). "The Hidden Search Queries AI Runs Before It Answers You." Blog
Lee, A. (2026). "I Rank on Page 1 -- What Gets Me Cited by AI?" Preprint v1. aiXiv | Dataset
SparkToro & Gumshoe (2025-2026). "AI Recommendations Change With Nearly Every Query." Analysis
Semrush (2026). "ChatGPT traffic analysis: Insights from 17 months of clickstream data." Blog