GEO Is Premature Optimization
No one has real data. Every GEO strategy is guesswork dressed as expertise. You are optimizing for a system you cannot measure, using tactics no one can validate, based on research that predates the thing it claims to study.
In November 2023, six researchers from Princeton, Georgia Tech, and IIT Delhi published a paper introducing something they called Generative Engine Optimization. They had built a benchmark. They had run experiments. They had numbers. The numbers said you could boost "visibility" in AI responses by up to 40% using certain tactics: citations, statistics, authoritative language.
Fourteen months later, there is an industry.
There are GEO consultants. GEO agencies. GEO tools. GEO platform listicles. GEO webinars where someone who discovered the term eight weeks ago explains it to someone who discovered it nine weeks ago. There are case studies claiming 8,337% growth. There are LinkedIn carousels. There are Twitter threads. There are people charging five figures a month for GEO retainers.
And here is the thing, the thing that everyone knows but no one will say out loud because saying it out loud would be bad for business: nobody knows if any of this works.
The Epistemological Problem
Let us be precise about what we do not know, because the list is long and it is important.
We do not know how often ChatGPT, Perplexity, Claude, Gemini, or any other AI system cites any particular source. There is no Search Console for LLMs. There is no index you can query. There is no crawl report. Ahrefs launched something called Brand Radar that claims to track AI mentions; Semrush has something similar; both are essentially taking samples from the firehose and hoping the samples are representative. They are not. They cannot be. The systems are stochastic. The same query produces different results at different times for different users. You cannot measure what you cannot reproduce.
We do not know what causes an AI to cite one source over another. The Princeton paper tested tactics on a custom benchmark they built themselves, using an evaluation framework they designed themselves, measuring "visibility" metrics they defined themselves. This is not a criticism; this is how research works. But the jump from "we observed X in our controlled experiment" to "you should restructure your content strategy around X" is a leap across a chasm that nobody has bothered to measure.
We do not know how stable any of this is. GPT-4 is not GPT-4o is not GPT-4o-mini is not whatever they ship next Tuesday. Claude 3 Opus is not Claude 3.5 Sonnet is not Claude 3.5 Haiku is not whatever Anthropic is training right now. These models are updated constantly. Their behavior changes. What worked in the paper's experiments in November 2023 may or may not work in December 2025. Probably not. Almost certainly not. The paper itself is already citing a model architecture that is two generations obsolete.
We do not know if the case studies are measuring signal or noise. One agency reports 3x lead growth from GEO tactics. Did the leads come from AI citations? From the content improvements that happened to also be good for traditional SEO? From seasonality? From a competitor going offline? From the fucking Mercury retrograde? We don't know. They don't know. Nobody knows. Sample sizes are tiny, confounding variables are infinite, and correlation is being packaged as causation and sold at enterprise rates.
The Knuth Principle
There is a famous line, often attributed to Donald Knuth though the provenance is murky, that goes: "Premature optimization is the root of all evil."
The full quote, which nobody ever includes, is better: "Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%."
The wisdom here is not "never optimize." The wisdom is: do not optimize until you know what matters. Do not tune the carburetor until you've confirmed the engine runs. Do not A/B test your button colors when your value proposition is incoherent. Do not hire a GEO consultant when you cannot measure whether GEO works.
GEO, as currently practiced, is premature optimization elevated to an industry. It is people optimizing for systems they cannot measure, using tactics they cannot validate, justified by research that predates the systems it claims to describe. It is the carburetor-tuning business for an engine that might be electric.
The Economics of Fog
Here is a question worth asking: who benefits from GEO being a thing?
The researchers benefit. They coined a term, published a paper, got citations, got invited to speak at conferences. This is how academia works. Good for them. Genuinely.
The tool vendors benefit. They can add a "GEO" tab to their existing SEO tools, charge a premium, and claim first-mover advantage in an emerging category. The category may or may not exist in two years. By then they will have moved on to the next thing. This is how SaaS works. Neutral.
The agencies benefit most. GEO is perfect for selling services because it is unfalsifiable. Did rankings go up in ChatGPT? Success! Did rankings go down? "AI is volatile, we need more time." Did rankings stay the same? "We're maintaining visibility in a competitive landscape." There is no failure state. There is only the invoice.
You, the person being sold GEO services, benefit least. You are paying real money for speculative tactics. You are diverting resources from things you can measure (conversion rates, actual traffic, actual revenue) to things you cannot (hypothetical future AI citations). You are, in the precise technical sense, gambling.
Gambling is fine. I have nothing against gambling. But you should know when you are gambling.
What the Research Actually Says
Let us return to the original Princeton paper, because it is better than the industry it spawned.
The researchers built a benchmark called GEO-BENCH with 10,000 queries across multiple domains. They tested various content optimization strategies. They found that certain tactics (citing sources, including statistics, using authoritative language) improved "visibility" in their evaluation framework.
Key caveats the paper includes but the industry ignores:
The evaluation was conducted on a single model snapshot. They used specific versions of specific models at a specific point in time. The models have changed. The prompts that worked then may not work now. This is acknowledged in the paper; it is ignored in the sales pitch.
The "visibility" metric is their own construction. They defined what it means to be "visible" in an AI response. This is a reasonable research choice. It is not a guarantee that their definition maps to business outcomes you care about.
Domain effects are real. What works for historical queries does not work for legal queries does not work for medical queries. The paper is explicit about this. The LinkedIn posts are not.
The 40% improvement figure is relative to their baseline. Forty percent better than bad is still potentially bad. Forty percent improvement in a metric you cannot verify is forty percent of nothing you can spend.
The paper is a good paper. It is doing what research papers do: establishing a framework, running initial experiments, opening a line of inquiry. What it is not doing is telling you how to run your business. The gap between "this is an interesting research direction" and "you should hire a consultant" is the gap where the money lives.
The Cargo Cult
Richard Feynman, in his 1974 Caltech commencement address, described something he called "cargo cult science." During World War II, Pacific islanders had observed American military bases: the runways, the control towers, the men with headphones waving planes down from the sky. When the war ended and the planes stopped coming, some islanders built replica runways, replica towers, replica headphones from wood and straw. They performed the rituals. They waited for the planes.
The planes did not come.
GEO, as currently practiced, has the structure of cargo cult science. It has the rituals: the citations, the statistics, the authoritative language, the structured data. It has the terminology: visibility, citation rate, generative engine, optimization. It has the conferences and the certifications and the case studies. What it does not have is a verifiable causal mechanism. What it does not have is a way to know if the planes are coming.
The Princeton researchers built a control tower. The industry built a religion.
What You Should Actually Do
Here is what I tell clients who ask about GEO, and I am giving it to you for free because honestly this stuff is getting embarrassing:
Do not optimize for AI specifically. Not yet. Not until you can measure it. Not until there is a Search Console equivalent that gives you actual data about actual citations from actual AI systems. That tool does not exist. When it exists, we can talk.
Do the things that have always worked. Write content that actually helps people. Be specific. Be accurate. Cite your sources (not because AI likes it but because readers like it and it makes your content better). Build a brand people search for by name. Get links from sites that matter. None of this is new. None of this requires a new acronym.
Stop reading GEO thought leadership. I know this is ironic coming from someone currently thought-leading at you about GEO. But the signal-to-noise ratio in this space is catastrophic. Most of what you read is people who learned about GEO three months ago explaining it to people who learned about it four months ago, citing the same Princeton paper they skimmed, making the same unfounded claims, and selling the same speculative services.
Accept uncertainty. This is the hard one. AI is going to change search. How exactly? We don't know. What should you do about it? Unclear. When will it matter? Depends. The honest answer, the answer nobody can sell you a retainer for, is: we are in a period of genuine uncertainty, and the best strategy in genuine uncertainty is to avoid large irreversible bets on speculative outcomes.
Nassim Taleb has a concept he calls "antifragility": positioning yourself to benefit from volatility rather than being destroyed by it. The antifragile approach to AI search is not to optimize for a single speculative future. It is to build assets (brand, content, expertise, audience) that are valuable across many possible futures. It is to remain flexible rather than committed. It is to not blow your budget on the cargo cult.
The Point
Generative Engine Optimization is a term that is fourteen months old. The models it purports to optimize for are updated more frequently than your dentist recommends flossing. The research is preliminary. The tools are speculative. The case studies are anecdotal. The consultants are confident. The confidence is unjustified.
This does not mean GEO will never matter. It might. AI search is real. The shift is real. But the playbook does not exist yet. Anyone telling you they have the playbook is selling you something. Probably a retainer.
The wise thing to do, the thing that Knuth would tell you to do if he cared about marketing (he does not), is to wait. Watch. Learn. Do not optimize for systems you cannot measure. Do not hire experts in fields that did not exist when they were born. Do not pay for certainty that nobody possesses.
Do the work that works. Measure what you can measure. Stay flexible. The fog will clear eventually.
Until then: premature optimization is still the root of all evil.
Even when you give it a shiny new acronym.
The alchemists never found the philosopher's stone. The cargo cultists never summoned the planes. The GEO consultants will never show you a verified, reproducible, causally-sound case study with proper controls. But they will send you an invoice.