AI SEO EXPERIMENTS

Is GEO Just Repackaged SEO? We Tested 10,293 Pages to Find Out

By Anthony Lee Published 2026-04-13

Google's John Mueller called GEO a likely scam. Agencies charge $50,000 for it. We have now run two major studies -- 4,375 pages in the first, 10,293 in the second -- to find out who is right.

The answer is more honest than either side wants to hear.

GEO is real. The features marketed as "GEO" carry measurable predictive signal. But most of what agencies sell as GEO (schema optimization, author bios, structured data) does not work for AI citation. And the features that do work are not new. They are just good writing with a new price tag.

This article covers two studies. The first (Experiment J, 4,375 pages) asked whether GEO features predict AI citation better than traditional SEO features. The second (Experiment M, 10,293 pages) went deeper by controlling for Google ranking position -- asking not just "what predicts citation?" but "among pages that already rank at the same position, what makes AI choose one over another?"

Key Numbers

Metric	Value	Source
Pages analyzed	10,293 (Experiment M) + 4,375 (Experiment J)	Two studies
GEO marginal value	+22.2 percentage points	Experiment J quadrant analysis
Content vs domain (within SERP)	Content AUC = 0.673, Domain AUC = 0.687	Experiment M
Top actionable feature	Comparison structure (d=0.43)	Experiment M
Strongest negative feature	First-person/blog tone (8.1% importance)	Experiment M
Page speed importance	2.9% (not significant in any position band)	Experiment M
Schema markup importance	FAQ helps (all 4 bands), Product helps, Article hurts	Experiment M + prior
Author attribution importance	0.2% (not a reliable signal)	Experiment M

Study 1: The SEO vs GEO Quadrant (Experiment J)

The Question

Is GEO a real discipline, or just traditional SEO with a new label?

We scored 4,375 pages on two independent composite scores:

Traditional SEO Score (7 features): heading count, internal links, page speed, schema markup, content-to-HTML ratio, word count, author attribution. These are the things every SEO guide since 2012 recommends.

GEO-Specific Score (8 features): statistics density, citation density, quotation density, list formatting, authoritative tone, technical terminology, heading density, readability. These are the features marketed specifically as GEO optimizations.

The correlation between the two scores was only 0.133. They measure different things.

The Quadrant Finding

We split all pages into four groups: High SEO / High GEO, High SEO / Low GEO, Low SEO / High GEO, and Low SEO / Low GEO.

Quadrant	n	Citation Rate
Low SEO, Low GEO	1,134	57.1%
High SEO, Low GEO	1,053	42.8%
Low SEO, High GEO	1,053	65.0%
High SEO, High GEO	1,135	65.0%

The most important row is the second one. Pages that follow traditional SEO best practices but lack GEO features have the lowest citation rate of any group (42.8%). These are pages with schema markup, author bios, strong internal linking, and keyword-optimized headings -- but without real data, technical depth, or readable prose.

The punchline: Low SEO / High GEO pages (65.0%) perform identically to High SEO / High GEO pages (65.0%). GEO features alone are sufficient. Adding SEO optimization on top of good content adds nothing to AI citation.

Among pages that already score high on traditional SEO, adding GEO features increases the citation rate from 42.8% to 65.0% -- a 22.2 percentage point improvement (p < 0.0001).

What Predicted Citation (Experiment J Feature Ranking)

The original Gradient Boosting model ranked features by importance:

Rank	Feature	Importance	Category	Status After Later Studies
1	Page speed	65.8%	SEO	Debunked: domain identity proxy (Experiment K)
2	Technical terminology	9.9%	GEO	Confirmed (Experiment M)
3	Content-to-HTML ratio	4.3%	SEO	Marginal in Experiment M (3.4%, not significant within bands)
4	Readability	2.6%	GEO	Not significant within position bands (Experiment M)
5	Statistics density	2.4%	GEO	Confirmed: significant in 3 of 4 position bands
6	Heading density	2.3%	GEO	Confirmed: H3 count is 5.9% importance in Experiment M
7	Word count	2.2%	SEO	Confirmed but reversed: longer pages are cited more within bands
8	Quotation density	2.2%	GEO	Did not replicate*
9	Internal links	2.2%	SEO	Confirmed: 3.7% importance in Experiment M
10	Citation density	0.9%	GEO	Did not replicate*
14	Author attribution	0.36%	SEO	Confirmed negligible: 0.2% in Experiment M
15	Schema markup	0.12%	SEO	Refined: type matters, not presence (see below)

*Experiment C (Princeton Replication, 3,205 pages, 4 platforms) found that quotation density goes the wrong direction on Perplexity and Google AI Mode, and citation density goes the wrong direction on 3 of 4 platforms. Not-cited pages have more source attributions than cited pages. Of the three Princeton GEO features (statistics, citations, quotations), only statistics density replicates on production platforms.

What Experiment J Got Wrong

The 65.8% page speed dominance was the study's biggest flaw. Our follow-up, Experiment K (n=4,350 pages, 2,510 domains), proved this was a domain identity proxy. The Gradient Boosting model was fingerprinting publisher domains by their CDN speed signatures, not learning that fast pages get cited.

The evidence:

Within the same domain, cited pages are actually slower (r=-0.221, p<0.000001)
Domain features alone predict at AUC=0.975
Adding domain identity drops load time importance by 92.4%
Load time correlates 0.712 with domain average speed (71% of a page's speed is just "which domain is it on")

This meant the "GEO is 77% SEO" conclusion from Experiment J was misleading. The 77% was almost entirely page speed, and page speed was domain identity in disguise.

Study 2: What Actually Predicts Citation (Experiment M)

The Better Question

Experiment J compared cited vs not-cited pages across different domains. That approach confounds page content with domain identity. A page on Wikipedia and a page on a small blog have different citation rates not because of their content, but because of their domain.

Experiment M solved this by using Google SERP position as the matching variable. All pages in a comparison set ranked for the same query at similar positions. They were equally "relevant" by Google's assessment. The question became: among these equally-qualified pages, which ones does AI cite and why?

Design: 250 queries, fully balanced (5 intent types x 10 verticals x 5 queries per cell). 10,293 pages crawled with 66 features each. Citations collected from ChatGPT, Perplexity, and Google AI Mode.

Google Rank Is the Gateway, Content Is the Selector

Position Band	Pages	Cited	Citation Rate
1-3	339	194	57.2%
4-7	625	264	42.2%
8-12	939	258	27.5%
13-20	1,650	223	13.5%

Position 1 has a 68.8% citation rate. Position 20 has 11.9%. Google rank matters -- it gets you into the candidate pool.

But even at position 1, 31% of pages are NOT cited. Even at positions 13-20, 13.5% still ARE. Something beyond rank determines which pages get selected from the pool. That something is page content.

The Features That Predict Citation Within Position Bands

Six features are significant in ALL FOUR position bands (p < 0.05 in each):

Feature	Direction	Effect Size (d)	What It Means
Comparison structure	+	0.43	Pages with "vs", comparison tables get cited more
Query term coverage	+	0.42	Pages containing the query's key words get cited more
H3 subheading count	+	0.19	More subheadings = more citations
Word count	+	0.20	Longer pages cited more (median 2,150 vs 1,415)
Primary source score	+	0.27	Pages that produce data get cited more than pages that aggregate it
Aggregator score	-	-0.27	Pages that primarily cite and quote others are cited less

Features significant in 3 of 4 bands: statistics density (+), first-person density (-), FAQ schema (+), current year mention (+), internal link count (+).

Features NOT significant within position bands: page speed (p > 0.39 in all 4 bands), author attribution, content-to-HTML ratio.

The Full Feature Importance Ranking (Experiment M)

Rank	Feature	Importance	Actionable?	Direction
1	Google position	27.9%	No	Lower = more cited
2	First-person density	8.1%	Yes	Less blog tone = more cited
3	Word count	7.1%	Yes	Longer (~2,000) = more cited
4	Comparison structure	6.4%	Yes	"vs", tables = more cited
5	H3 subheading count	5.9%	Yes	More deep structure = more cited
6	Primary source score	4.1%	Yes	Produce data, do not aggregate it
7	Heading density	3.9%	Yes	Better organized = more cited
8	Query term position	3.8%	Yes	Answer earlier in the page
9	Internal links	3.7%	Yes	More internal links
10	Query term coverage	3.7%	Yes	Cover the query's key words
11	Content-to-HTML ratio	3.4%	Yes	Leaner code
12	H2 count	3.1%	Yes	More section headings
13	Load time	2.9%	No	Not significant in any band
14	External links	2.5%	Yes	Slightly positive
15	Statistics per 1k words	2.5%	Yes	More data points

Actionable features account for 69.2% of total model importance. The strongest actionable feature is not a technical SEO signal. It is first-person density -- a writing style metric. Pages that read like blog posts ("I think...", "in my experience...") are significantly less likely to be cited than pages that read like reference material.

Content Structure Is the Biggest Lever

When features are added in blocks to the model, here is each category's marginal contribution:

Category	Marginal Lift	What It Means
Content structure (headings, word count, lists)	+0.021 AUC	The largest lift after position
Semantic relevance (query coverage, term position)	+0.013	Second largest
Trust signals (schema, author, date)	+0.005	Small
Content quality (stats, technical depth)	-0.003	Captured by structure
Technical (load time, page size)	-0.001	Nothing

Content structure is the single largest lever beyond Google rank position. Content quality features (statistics, technical depth) add nothing beyond what structure already captures -- because well-structured pages naturally contain higher-quality content.

Technical features (load time, page size) add literally nothing.

Content Beats Domain Within the SERP

For the first time in this research program, content features outperformed domain identity:

Model	AUC
Content features alone	0.673
Domain identity alone	0.687
Content + domain	0.697

Without position control (Experiment K), domain identity dominated at AUC = 0.975. With position control (Experiment M), content and domain provide comparable power. Content adds +0.060 on top of domain. Domain adds only +0.012 on top of content.

This means: once you control for "this page already ranks for this query," what the page says matters as much as what domain it is on.

Cross-Platform Consensus: The Quality Gradient

Pages cited by multiple AI platforms independently show a dramatic quality gradient:

Cited By	Pages	Word Count	Stats/1k	Primary Source Score	Query Coverage
0 platforms	3,594	1,219	2.2	-10.2	0.75
1 platform	6,213	1,765	4.0	-5.0	1.00
2 platforms	435	2,281	6.4	-2.4	1.00
3+ platforms	51	2,506	15.3	+10.7	1.00

Pages cited by all three platforms have 7x the statistics density, 2x the word count, positive primary source scores, and 100% query term coverage. When three independent AI systems with different search backends all choose the same page, that page has measurably different content characteristics.

What About Schema Markup?

The original Experiment J found schema markup at 0.12% importance -- nearly worthless. That was for schema presence (has any schema vs does not). But Experiment M and prior work refined this: schema TYPE matters, not presence.

Schema Type	Effect	Source
Product schema	Strong positive	Cross-domain analysis
FAQ schema	Significant in all 4 position bands	Experiment M
Review schema	Positive	Cross-domain analysis
Article schema	Negative on non-article pages	Cross-domain analysis
General presence	Not significant	Cross-domain analysis

The advice is not "stop doing schema." It is: use FAQ schema where Q&A content is natural, use Product schema on product pages, and understand that Article schema is negatively associated with citation. Generic schema presence (having any structured data at all) does not help.

What Actually Matters by Industry

Different industries have different citation profiles. A one-size-fits-all approach misses this completely. From our Experiment 2E vertical analysis (8,043 pages, 14 verticals):

Vertical	Cited Word Count (Median)	Stats per 1k	H3 Count	FAQ Schema Usage	What Matters Most
Consumer Electronics	3,422	6.3	9	7%	Length and depth
Health	3,302	7.0	11	12%	Word count (r=-0.531)
Ecommerce	3,317	9.0	16	17%	H3 depth (r=-0.477)
Technology	3,095	4.8	12	11%	Word count (r=-0.610)
SaaS	3,040	6.2	16	23%	H3 depth + FAQ
Finance	2,841	11.2	10	21%	Internal links (r=-0.401)
Automotive	1,866	3.8	8	8%	Shortest content wins
Pet	2,235	5.7	9	14%	Community (Reddit 26.7%)

A finance page needs 11.2 statistics per 1,000 words and FAQ schema (21% of cited finance pages use it). A technology page needs 3,095 words and technical terminology (2.0 per 1k). An automotive page needs only 1,866 words. Applying the same targets to every vertical is a waste.

So Is GEO a Scam?

No. But the honest answer has three parts.

Part 1: The features marketed as "GEO" carry real signal. Pages with comparison structure, deep subheadings, statistics, and objective tone are significantly more likely to be cited. The 22.2 percentage point improvement from GEO features in the quadrant analysis is real and stable. Experiment M confirmed this through position-controlled analysis.

Part 2: Most of what agencies sell as GEO does not work. If a GEO package focuses on schema presence (not type), author bios, llms.txt implementation (which gets zero bot fetches in our monitoring data), and generic keyword optimization for AI, our data shows those tactics are either irrelevant or counterproductive. Author attribution is 0.2% of importance. Schema presence is not significant. Page speed is not significant within position bands.

Part 3: The features that work are not new. Comparison structure, deep headings, statistics, objective tone, answering the query early -- this is good writing. It has always been good advice. What AI platforms have done is strip away the SEO scaffolding (backlinks, schema presence, author bios, click-through signals) and reward content quality more directly.

The GEO industry is not selling snake oil. But most of what it sells is the wrong part of the shelf.

The $50,000 Question

If a business pays $50,000 for GEO consulting that focuses on schema presence, author bios, and structured data, our data predicts zero improvement in AI citation.

If the same money went toward:

Adding comparison structure to key pages (d=0.43, strongest actionable signal)
Restructuring content with deep H3 subheadings (5.9% importance)
Removing blog/first-person tone (8.1% importance, negative)
Adding real statistics and original analysis (2.5% + 4.1% primary source score)
Ensuring pages explicitly answer the query's key terms (3.7%)
Adding FAQ schema where natural (significant in all 4 position bands)

...it would produce real results. The investment is the same. The tactics determine the outcome.

What to Do

Highest priority (position-controlled evidence)

Add comparison structure. "vs" sections, comparison tables, side-by-side analysis. This is the single strongest actionable signal (d=0.43), significant across all five intent types and all four position bands.
Remove blog/first-person tone. Write as reference material, not personal narrative. First-person density is the second most important actionable feature (8.1%) and it is negative.
Structure with deep H3 subheadings. Cited pages have 2x the H3 count (median 9-10 vs 4-5). Each H3 becomes a potential extraction point for AI.
Cover the query's actual terms. 100% of pages cited by 3+ platforms had complete query term coverage. Make sure your page contains the words people search for.
Include real data. Pages cited by 3+ platforms have 7x the statistics density of uncited pages.
Add FAQ schema. The only schema type significant across all 4 position bands.

Lower priority (real but smaller effects)

Target approximately 2,000 words. Within position bands, cited pages are 42-52% longer.
Add Product schema to product pages .
Mention the current year where relevant.
Maintain strong internal link structure.

Stop doing (for AI citation)

Stop optimizing schema presence without considering type. Generic schema is not significant.
Stop adding author bios specifically for AI citation. It is 0.2% of importance.
Stop investing in page speed for AI citation. It is not significant within any position band.
Stop treating E-E-A-T signals as AI optimization. AI platforms read your content, not your author page.

Limitations

This research has several limitations.

Experiment J (4,375 pages): The quadrant analysis does not control for domain identity. The page speed dominance (65.8%) was proven to be a domain proxy by Experiment K. The GEO score included quotation density and citation density, which did not replicate in Experiment C.

Experiment M (10,293 pages): Observational, not interventional. We have shown correlations, not proven causation. 250 queries across 10 verticals is larger than prior GEO studies but still a sample. Results are SERP-conditional: they apply to pages that already rank in Google's top 20.

Cross-study comparison: Experiments J and M use different methodologies (composite scores vs individual features, uncontrolled vs position-controlled). The quadrant finding and the feature ranking are complementary but come from different analyses.

References

Lee, A. (2026). "I Rank on Page 1 -- What Gets Me Cited by AI?" Preprint v1. aiXiv | Dataset
Lee, A. (2026). "Query Intent and Google Rank as Joint Predictors of AI Citation: A Multi-Platform Observational Study." DOI
Aggarwal, P., Murahari, V., et al. (2024). GEO: Generative Engine Optimization. KDD 2024. DOI

The Bottom Line

GEO is not a scam. But most of what is sold as GEO does not work.

Traditional SEO signals (schema presence, author bios, page speed) do not predict AI citation within position bands. Content substance features (comparison structure, deep subheadings, statistics, objective tone, query term coverage) do.

The mechanism is content quality. AI platforms have stripped away the SEO scaffolding and evaluate content more directly. Pages worth citing get cited. Pages dressed up with SEO signals but lacking substance get passed over.

If you want AI to cite your pages, write pages worth citing. That advice has not changed since 2012. What has changed is that AI platforms enforce it more honestly than Google ever did.