Google's John Mueller called GEO a likely scam. Agencies charge $50,000 for it. We have now run two major studies -- 4,375 pages in the first, 10,293 in the second -- to find out who is right.
The answer is more honest than either side wants to hear.
GEO is real. The features marketed as "GEO" carry measurable predictive signal. But most of what agencies sell as GEO (schema optimization, author bios, structured data) does not work for AI citation. And the features that do work are not new. They are just good writing with a new price tag.
This article covers two studies. The first (Experiment J, 4,375 pages) asked whether GEO features predict AI citation better than traditional SEO features. The second (Experiment M, 10,293 pages) went deeper by controlling for Google ranking position -- asking not just "what predicts citation?" but "among pages that already rank at the same position, what makes AI choose one over another?"
Key Numbers
| Metric | Value | Source |
|---|---|---|
| Pages analyzed | 10,293 (Experiment M) + 4,375 (Experiment J) | Two studies |
| GEO marginal value | +22.2 percentage points | Experiment J quadrant analysis |
| Content vs domain (within SERP) | Content AUC = 0.673, Domain AUC = 0.687 | Experiment M |
| Top actionable feature | Comparison structure (d=0.43) | Experiment M |
| Strongest negative feature | First-person/blog tone (8.1% importance) | Experiment M |
| Page speed importance | 2.9% (not significant in any position band) | Experiment M |
| Schema markup importance | FAQ helps (all 4 bands), Product helps (OR=3.09), Article hurts (OR=0.76) | Experiment M + prior |
| Author attribution importance | 0.2% (not a reliable signal) | Experiment M |
Study 1: The SEO vs GEO Quadrant (Experiment J)
The Question
Is GEO a real discipline, or just traditional SEO with a new label?
We scored 4,375 pages on two independent composite scores:
Traditional SEO Score (7 features): heading count, internal links, page speed, schema markup, content-to-HTML ratio, word count, author attribution. These are the things every SEO guide since 2012 recommends.
GEO-Specific Score (8 features): statistics density, citation density, quotation density, list formatting, authoritative tone, technical terminology, heading density, readability. These are the features marketed specifically as GEO optimizations.
The correlation between the two scores was only 0.133. They measure different things.
The Quadrant Finding
We split all pages into four groups: High SEO / High GEO, High SEO / Low GEO, Low SEO / High GEO, and Low SEO / Low GEO.
| Quadrant | n | Citation Rate |
|---|---|---|
| Low SEO, Low GEO | 1,134 | 57.1% |
| High SEO, Low GEO | 1,053 | 42.8% |
| Low SEO, High GEO | 1,053 | 65.0% |
| High SEO, High GEO | 1,135 | 65.0% |
The most important row is the second one. Pages that follow traditional SEO best practices but lack GEO features have the lowest citation rate of any group (42.8%). These are pages with schema markup, author bios, strong internal linking, and keyword-optimized headings -- but without real data, technical depth, or readable prose.
The punchline: Low SEO / High GEO pages (65.0%) perform identically to High SEO / High GEO pages (65.0%). GEO features alone are sufficient. Adding SEO optimization on top of good content adds nothing to AI citation.
Among pages that already score high on traditional SEO, adding GEO features increases the citation rate from 42.8% to 65.0% -- a 22.2 percentage point improvement (p < 0.0001).
What Predicted Citation (Experiment J Feature Ranking)
The original Gradient Boosting model ranked features by importance:
| Rank | Feature | Importance | Category | Status After Later Studies |
|---|---|---|---|---|
| 1 | Page speed | 65.8% | SEO | Debunked: domain identity proxy (Experiment K) |
| 2 | Technical terminology | 9.9% | GEO | Confirmed (Experiment M) |
| 3 | Content-to-HTML ratio | 4.3% | SEO | Marginal in Experiment M (3.4%, not significant within bands) |
| 4 | Readability | 2.6% | GEO | Not significant within position bands (Experiment M) |
| 5 | Statistics density | 2.4% | GEO | Confirmed: significant in 3 of 4 position bands |
| 6 | Heading density | 2.3% | GEO | Confirmed: H3 count is 5.9% importance in Experiment M |
| 7 | Word count | 2.2% | SEO | Confirmed but reversed: longer pages are cited more within bands |
| 8 | Quotation density | 2.2% | GEO | Did not replicate* |
| 9 | Internal links | 2.2% | SEO | Confirmed: 3.7% importance in Experiment M |
| 10 | Citation density | 0.9% | GEO | Did not replicate* |
| 14 | Author attribution | 0.36% | SEO | Confirmed negligible: 0.2% in Experiment M |
| 15 | Schema markup | 0.12% | SEO | Refined: type matters, not presence (see below) |
*Experiment C (Princeton Replication, 3,205 pages, 4 platforms) found that quotation density goes the wrong direction on Perplexity and Google AI Mode, and citation density goes the wrong direction on 3 of 4 platforms. Not-cited pages have more source attributions than cited pages. Of the three Princeton GEO features (statistics, citations, quotations), only statistics density replicates on production platforms.
What Experiment J Got Wrong
The 65.8% page speed dominance was the study's biggest flaw. Our follow-up, Experiment K (n=4,350 pages, 2,510 domains), proved this was a domain identity proxy. The Gradient Boosting model was fingerprinting publisher domains by their CDN speed signatures, not learning that fast pages get cited.
The evidence:
- Within the same domain, cited pages are actually slower (r=-0.221, p<0.000001)
- Domain features alone predict at AUC=0.975
- Adding domain identity drops load time importance by 92.4%
- Load time correlates 0.712 with domain average speed (71% of a page's speed is just "which domain is it on")
This meant the "GEO is 77% SEO" conclusion from Experiment J was misleading. The 77% was almost entirely page speed, and page speed was domain identity in disguise.
Study 2: What Actually Predicts Citation (Experiment M)
The Better Question
Experiment J compared cited vs not-cited pages across different domains. That approach confounds page content with domain identity. A page on Wikipedia and a page on a small blog have different citation rates not because of their content, but because of their domain.
Experiment M solved this by using Google SERP position as the matching variable. All pages in a comparison set ranked for the same query at similar positions. They were equally "relevant" by Google's assessment. The question became: among these equally-qualified pages, which ones does AI cite and why?
Design: 250 queries, fully balanced (5 intent types x 10 verticals x 5 queries per cell). 10,293 pages crawled with 66 features each. Citations collected from ChatGPT, Perplexity, and Google AI Mode.
Google Rank Is the Gateway, Content Is the Selector
| Position Band | Pages | Cited | Citation Rate |
|---|---|---|---|
| 1-3 | 339 | 194 | 57.2% |
| 4-7 | 625 | 264 | 42.2% |
| 8-12 | 939 | 258 | 27.5% |
| 13-20 | 1,650 | 223 | 13.5% |
Position 1 has a 68.8% citation rate. Position 20 has 11.9%. Google rank matters -- it gets you into the candidate pool.
But even at position 1, 31% of pages are NOT cited. Even at positions 13-20, 13.5% still ARE. Something beyond rank determines which pages get selected from the pool. That something is page content.
The Features That Predict Citation Within Position Bands
Six features are significant in ALL FOUR position bands (p < 0.05 in each):
| Feature | Direction | Effect Size (d) | What It Means |
|---|---|---|---|
| Comparison structure | + | 0.43 | Pages with "vs", comparison tables get cited more |
| Query term coverage | + | 0.42 | Pages containing the query's key words get cited more |
| H3 subheading count | + | 0.19 | More subheadings = more citations |
| Word count | + | 0.20 | Longer pages cited more (median 2,150 vs 1,415) |
| Primary source score | + | 0.27 | Pages that produce data get cited more than pages that aggregate it |
| Aggregator score | - | -0.27 | Pages that primarily cite and quote others are cited less |
Features significant in 3 of 4 bands: statistics density (+), first-person density (-), FAQ schema (+), current year mention (+), internal link count (+).
Features NOT significant within position bands: page speed (p > 0.39 in all 4 bands), author attribution, content-to-HTML ratio.
The Full Feature Importance Ranking (Experiment M)
| Rank | Feature | Importance | Actionable? | Direction |
|---|---|---|---|---|
| 1 | Google position | 27.9% | No | Lower = more cited |
| 2 | First-person density | 8.1% | Yes | Less blog tone = more cited |
| 3 | Word count | 7.1% | Yes | Longer (~2,000) = more cited |
| 4 | Comparison structure | 6.4% | Yes | "vs", tables = more cited |
| 5 | H3 subheading count | 5.9% | Yes | More deep structure = more cited |
| 6 | Primary source score | 4.1% | Yes | Produce data, do not aggregate it |
| 7 | Heading density | 3.9% | Yes | Better organized = more cited |
| 8 | Query term position | 3.8% | Yes | Answer earlier in the page |
| 9 | Internal links | 3.7% | Yes | More internal links |
| 10 | Query term coverage | 3.7% | Yes | Cover the query's key words |
| 11 | Content-to-HTML ratio | 3.4% | Yes | Leaner code |
| 12 | H2 count | 3.1% | Yes | More section headings |
| 13 | Load time | 2.9% | No | Not significant in any band |
| 14 | External links | 2.5% | Yes | Slightly positive |
| 15 | Statistics per 1k words | 2.5% | Yes | More data points |
Actionable features account for 69.2% of total model importance. The strongest actionable feature is not a technical SEO signal. It is first-person density -- a writing style metric. Pages that read like blog posts ("I think...", "in my experience...") are significantly less likely to be cited than pages that read like reference material.
Content Structure Is the Biggest Lever
When features are added in blocks to the model, here is each category's marginal contribution:
| Category | Marginal Lift | What It Means |
|---|---|---|
| Content structure (headings, word count, lists) | +0.021 AUC | The largest lift after position |
| Semantic relevance (query coverage, term position) | +0.013 | Second largest |
| Trust signals (schema, author, date) | +0.005 | Small |
| Content quality (stats, technical depth) | -0.003 | Captured by structure |
| Technical (load time, page size) | -0.001 | Nothing |
Content structure is the single largest lever beyond Google rank position. Content quality features (statistics, technical depth) add nothing beyond what structure already captures -- because well-structured pages naturally contain higher-quality content.
Technical features (load time, page size) add literally nothing.
Content Beats Domain Within the SERP
For the first time in this research program, content features outperformed domain identity:
| Model | AUC |
|---|---|
| Content features alone | 0.673 |
| Domain identity alone | 0.687 |
| Content + domain | 0.697 |
Without position control (Experiment K), domain identity dominated at AUC = 0.975. With position control (Experiment M), content and domain provide comparable power. Content adds +0.060 on top of domain. Domain adds only +0.012 on top of content.
This means: once you control for "this page already ranks for this query," what the page says matters as much as what domain it is on.
Cross-Platform Consensus: The Quality Gradient
Pages cited by multiple AI platforms independently show a dramatic quality gradient:
| Cited By | Pages | Word Count | Stats/1k | Primary Source Score | Query Coverage |
|---|---|---|---|---|---|
| 0 platforms | 3,594 | 1,219 | 2.2 | -10.2 | 0.75 |
| 1 platform | 6,213 | 1,765 | 4.0 | -5.0 | 1.00 |
| 2 platforms | 435 | 2,281 | 6.4 | -2.4 | 1.00 |
| 3+ platforms | 51 | 2,506 | 15.3 | +10.7 | 1.00 |
Pages cited by all three platforms have 7x the statistics density, 2x the word count, positive primary source scores, and 100% query term coverage. When three independent AI systems with different search backends all choose the same page, that page has measurably different content characteristics.
What About Schema Markup?
The original Experiment J found schema markup at 0.12% importance -- nearly worthless. That was for schema presence (has any schema vs does not). But Experiment M and prior work refined this: schema TYPE matters, not presence.
| Schema Type | Effect | Source |
|---|---|---|
| Product schema | OR = 3.09 (strong positive) | Cross-domain analysis |
| FAQ schema | Significant in all 4 position bands | Experiment M |
| Review schema | OR = 2.24 (positive) | Cross-domain analysis |
| Article schema | OR = 0.76 (negative -- hurts) | Cross-domain analysis |
| General presence | Not significant (p = 0.78) | Cross-domain analysis |
The advice is not "stop doing schema." It is: use FAQ schema where Q&A content is natural, use Product schema on product pages, and understand that Article schema is negatively associated with citation. Generic schema presence (having any structured data at all) does not help.
What Actually Matters by Industry
Different industries have different citation profiles. A one-size-fits-all approach misses this completely. From our Experiment 2E vertical analysis (8,043 pages, 14 verticals):
| Vertical | Cited Word Count (Median) | Stats per 1k | H3 Count | FAQ Schema Usage | What Matters Most |
|---|---|---|---|---|---|
| Consumer Electronics | 3,422 | 6.3 | 9 | 7% | Length and depth |
| Health | 3,302 | 7.0 | 11 | 12% | Word count (r=-0.531) |
| Ecommerce | 3,317 | 9.0 | 16 | 17% | H3 depth (r=-0.477) |
| Technology | 3,095 | 4.8 | 12 | 11% | Word count (r=-0.610) |
| SaaS | 3,040 | 6.2 | 16 | 23% | H3 depth + FAQ |
| Finance | 2,841 | 11.2 | 10 | 21% | Internal links (r=-0.401) |
| Automotive | 1,866 | 3.8 | 8 | 8% | Shortest content wins |
| Pet | 2,235 | 5.7 | 9 | 14% | Community (Reddit 26.7%) |
A finance page needs 11.2 statistics per 1,000 words and FAQ schema (21% of cited finance pages use it). A technology page needs 3,095 words and technical terminology (2.0 per 1k). An automotive page needs only 1,866 words. Applying the same targets to every vertical is a waste.
So Is GEO a Scam?
No. But the honest answer has three parts.
Part 1: The features marketed as "GEO" carry real signal. Pages with comparison structure, deep subheadings, statistics, and objective tone are significantly more likely to be cited. The 22.2 percentage point improvement from GEO features in the quadrant analysis is real and stable. Experiment M confirmed this through position-controlled analysis.
Part 2: Most of what agencies sell as GEO does not work. If a GEO package focuses on schema presence (not type), author bios, llms.txt implementation (which gets zero bot fetches in our monitoring data), and generic keyword optimization for AI, our data shows those tactics are either irrelevant or counterproductive. Author attribution is 0.2% of importance. Schema presence is not significant. Page speed is not significant within position bands.
Part 3: The features that work are not new. Comparison structure, deep headings, statistics, objective tone, answering the query early -- this is good writing. It has always been good advice. What AI platforms have done is strip away the SEO scaffolding (backlinks, schema presence, author bios, click-through signals) and reward content quality more directly.
The GEO industry is not selling snake oil. But most of what it sells is the wrong part of the shelf.
The $50,000 Question
If a business pays $50,000 for GEO consulting that focuses on schema presence, author bios, and structured data, our data predicts zero improvement in AI citation.
If the same money went toward:
- Adding comparison structure to key pages (d=0.43, strongest actionable signal)
- Restructuring content with deep H3 subheadings (5.9% importance)
- Removing blog/first-person tone (8.1% importance, negative)
- Adding real statistics and original analysis (2.5% + 4.1% primary source score)
- Ensuring pages explicitly answer the query's key terms (3.7%)
- Adding FAQ schema where natural (significant in all 4 position bands)
...it would produce real results. The investment is the same. The tactics determine the outcome.
What to Do
Highest priority (position-controlled evidence)
- Add comparison structure. "vs" sections, comparison tables, side-by-side analysis. This is the single strongest actionable signal (d=0.43), significant across all five intent types and all four position bands.
- Remove blog/first-person tone. Write as reference material, not personal narrative. First-person density is the second most important actionable feature (8.1%) and it is negative.
- Structure with deep H3 subheadings. Cited pages have 2x the H3 count (median 9-10 vs 4-5). Each H3 becomes a potential extraction point for AI.
- Cover the query's actual terms. 100% of pages cited by 3+ platforms had complete query term coverage. Make sure your page contains the words people search for.
- Include real data. Pages cited by 3+ platforms have 7x the statistics density of uncited pages.
- Add FAQ schema. The only schema type significant across all 4 position bands.
Lower priority (real but smaller effects)
- Target approximately 2,000 words. Within position bands, cited pages are 42-52% longer.
- Add Product schema to product pages (OR=3.09 cross-domain).
- Mention the current year where relevant.
- Maintain strong internal link structure.
Stop doing (for AI citation)
- Stop optimizing schema presence without considering type. Generic schema is not significant.
- Stop adding author bios specifically for AI citation. It is 0.2% of importance.
- Stop investing in page speed for AI citation. It is not significant within any position band.
- Stop treating E-E-A-T signals as AI optimization. AI platforms read your content, not your author page.
Limitations
This research has several limitations.
Experiment J (4,375 pages): The quadrant analysis does not control for domain identity. The page speed dominance (65.8%) was proven to be a domain proxy by Experiment K. The GEO score included quotation density and citation density, which did not replicate in Experiment C.
Experiment M (10,293 pages): Observational, not interventional. We have shown correlations, not proven causation. 250 queries across 10 verticals is larger than prior GEO studies but still a sample. Results are SERP-conditional: they apply to pages that already rank in Google's top 20.
Cross-study comparison: Experiments J and M use different methodologies (composite scores vs individual features, uncontrolled vs position-controlled). The quadrant finding and the feature ranking are complementary but come from different analyses.
References
- Lee, A. (2026). "I Rank on Page 1 -- What Gets Me Cited by AI?" Preprint v1. aiXiv | Dataset
- Lee, A. (2026). "Query Intent, Not Google Rank: What Best Predicts AI Citation Behavior." DOI
- Aggarwal, P., Murahari, V., et al. (2024). GEO: Generative Engine Optimization. KDD 2024. DOI
The Bottom Line
GEO is not a scam. But most of what is sold as GEO does not work.
Traditional SEO signals (schema presence, author bios, page speed) do not predict AI citation within position bands. Content substance features (comparison structure, deep subheadings, statistics, objective tone, query term coverage) do.
The mechanism is content quality. AI platforms have stripped away the SEO scaffolding and evaluate content more directly. Pages worth citing get cited. Pages dressed up with SEO signals but lacking substance get passed over.
If you want AI to cite your pages, write pages worth citing. That advice has not changed since 2012. What has changed is that AI platforms enforce it more honestly than Google ever did.