← All Research

Reddit Doesn’t Get Cited (Through the API): Training Data Influence, Access-Channel Divergence, and the Shadow Corpus in AI Brand Recommendations

Author

Anthony Lee — AI+Automation

Status

Preprint — February 2026 (v3) | Not yet peer-reviewed

Download PDF

Key Findings

Reddit Gets Zero API Citations, Despite Dominating Google

Reddit occupied 38.3% of Google Top-3 positions but received exactly zero AI citations via API. The API pipeline categorically suppresses Reddit content.

Web UI Tells a Different Story

Web UIs cite Reddit at 44% (Google AI Mode), 20% (Perplexity), and 17% (ChatGPT). Validation queries surface Reddit at the highest rates: 71% on Google AI Mode, 46% on Perplexity.

Reddit Shapes AI Recommendations Through Training Data

Mean Spearman correlation of 0.554 between Reddit community consensus and AI brand recommendations across all 12 consumer categories. The brands Reddit upvotes are the brands AI recommends.

Reddit is a "Shadow Corpus"

Three influence channels: training data (absorbed into model weights), web UI citations (actively retrieved), and API suppression (categorically blocked). No single methodology can observe all three.

Read Full Abstract

AI chatbots functionally never cite Reddit through their APIs. In a companion study of 6,699 URLs cited by ChatGPT and Perplexity across 120 product recommendation queries, we observed zero Reddit citations despite Reddit occupying 38.3% of Google's Top-3 organic positions.

We collected 12,187 posts and 103,696 comments from 60 subreddits spanning 12 consumer product categories. The mean Spearman rank correlation between Reddit brand consensus and AI recommendations was 0.554, with all 12 categories reaching significance.

For web UI analysis, APIs produced 0% Reddit citation rates while web UIs produced 44% (Google AI Mode), 20% (Perplexity), and 17% (ChatGPT). These findings support a three-channel model of Reddit's influence on AI outputs.

Keywords

Reddit, AI brand recommendations, training data influence, access-channel divergence, API vs web UI, Generative Engine Optimization, GEO, Spearman correlation, community consensus, shadow corpus, ChatGPT, Claude, Perplexity, Gemini

Citation

Lee, A. (2026). Reddit doesn’t get cited (through the API): Training data influence, access-channel divergence, and the shadow corpus in AI brand recommendations. Preprint v3, AI+Automation.

DOI: https://doi.org/10.5281/zenodo.18679003

ORCID ORCID: 0009-0002-4815-6373 aiXiv: aixiv.260218.000005