Reddit Doesn’t Get Cited (Through the API): Training Data Influence, Access-Channel Divergence, and the Shadow Corpus in AI Brand Recommendations
Anthony Lee — AI+Automation
Preprint — February 2026 (v3) | Not yet peer-reviewed
Key Findings
Reddit Gets Zero API Citations, Despite Dominating Google
Reddit occupied 38.3% of Google Top-3 positions but received exactly zero AI citations via API. The API pipeline categorically suppresses Reddit content.
Web UI Tells a Different Story
Web UIs cite Reddit at 44% (Google AI Mode), 20% (Perplexity), and 17% (ChatGPT). Validation queries surface Reddit at the highest rates: 71% on Google AI Mode, 46% on Perplexity.
Reddit Shapes AI Recommendations Through Training Data
Mean Spearman correlation of 0.554 between Reddit community consensus and AI brand recommendations across all 12 consumer categories. The brands Reddit upvotes are the brands AI recommends.
Reddit is a "Shadow Corpus"
Three influence channels: training data (absorbed into model weights), web UI citations (actively retrieved), and API suppression (categorically blocked). No single methodology can observe all three.
Read Full Abstract
AI chatbots functionally never cite Reddit through their APIs. In a companion study of 6,699 URLs cited by ChatGPT and Perplexity across 120 product recommendation queries, we observed zero Reddit citations despite Reddit occupying 38.3% of Google's Top-3 organic positions.
We collected 12,187 posts and 103,696 comments from 60 subreddits spanning 12 consumer product categories. The mean Spearman rank correlation between Reddit brand consensus and AI recommendations was 0.554, with all 12 categories reaching significance.
For web UI analysis, APIs produced 0% Reddit citation rates while web UIs produced 44% (Google AI Mode), 20% (Perplexity), and 17% (ChatGPT). These findings support a three-channel model of Reddit's influence on AI outputs.
Keywords
Reddit, AI brand recommendations, training data influence, access-channel divergence, API vs web UI, Generative Engine Optimization, GEO, Spearman correlation, community consensus, shadow corpus, ChatGPT, Claude, Perplexity, Gemini
Citation
Lee, A. (2026). Reddit doesn’t get cited (through the API): Training data influence, access-channel divergence, and the shadow corpus in AI brand recommendations. Preprint v3, AI+Automation.