Note
Programmatic SEO Is a Chainsaw (Most People Need a Scalpel)
Fifty thousand pages won't save you if forty-nine thousand of them are garbage.
The phone rang at nine in the morning, which is early for a client call but not early enough to be an emergency, and I answered it with the particular brand of optimism I reserve for Mondays, which is to say almost none. The voice on the other end was bright, energetic, the kind of voice that belongs to someone who has just discovered a new tactic and cannot wait to tell you how well it's working. "We launched fifty thousand pages over the weekend," he said, and I could hear him smiling through the phone, the way people smile when they think they've found a shortcut to the thing everyone else is doing the hard way. "Fifty thousand. City plus service keyword. Every combination. We used a script."
I pulled up his site on my laptop, which was balanced on the arm of the couch because I had not yet made it to my desk, and I typed in one of the URLs. City-name-plumber dot html. The page loaded. It had a headline that said "Plumber in [City Name]" and below it three paragraphs of text that could have been written about any city in America because it contained no information specific to any city in America. There was a stock photo of a man holding a wrench. The man looked confused, which felt appropriate. Below the stock photo was a bulleted list of services that was identical to the bulleted list on every other page, all fifty thousand of them, because the template had been designed to swap in a city name and nothing else.
"How many has Google indexed?" I asked, already knowing the answer would not be fifty thousand.
"Forty-three," he said. The brightness had left his voice.
Forty-three out of fifty thousand. That is a 0.086% success rate, which is worse than the conversion rate on the worst landing page I have ever seen, and I once worked with a landing page that consisted entirely of a phone number rendered as an image with no alt text. Google had looked at fifty thousand pages and decided that forty-nine thousand nine hundred and fifty-seven of them were not worth acknowledging. And Google was right.
This is the story of programmatic SEO, or at least the version of it that most people encounter first: the idea that if you can generate ten pages, you can generate ten thousand, and if ten thousand then why not a hundred thousand, and if the math works at scale then surely more pages means more traffic means more revenue means you can retire to a beach somewhere and never think about canonical tags again. It is a seductive idea. It is also, in the wrong hands, a chainsaw being used to perform surgery.
What Programmatic SEO Actually Is (And What It Isn't)
Let me be precise about terminology, because the phrase "programmatic SEO" gets thrown around in ways that would make its originators wince (assuming it has originators, which is debatable, because the practice existed long before the name did - people have been generating pages from databases since the internet was young enough to drink). Programmatic SEO is template-driven page generation at scale. You have a data source - a database, an API, a spreadsheet if you're feeling adventurous - and you have a template, and you combine them to produce pages, sometimes hundreds of pages, sometimes millions, each one targeting a specific search query or query cluster.
That is the mechanical description. The important part is what it's for. Programmatic SEO exists to serve a specific kind of search demand: queries that follow a pattern, where each variant represents a genuine user need, and where you have unique data or value to offer for each variant. "Best restaurants in [city]." "Weather in [city]." "[Product A] vs [Product B]." "[Software] integrations with [other software]." These are patterns with thousands or millions of legitimate variants, each one searched by real people who want a real answer.
What programmatic SEO is not - and this is where my client with the fifty thousand plumber pages went sideways - is a multiplication trick. You cannot take thin content and make it valuable by stamping it out at scale. Scale does not create value. Scale amplifies whatever you already have. If you have genuine, unique data for each page, scale amplifies that value across thousands of pages. If you have a template with a city name swapped in, scale amplifies that emptiness across thousands of pages, and Google notices the emptiness a lot faster than it notices the value.
When The Chainsaw Is The Right Tool
I want to be clear that I am not against programmatic SEO. I have built programmatic SEO systems that generate millions of dollars in organic traffic. I have designed templates and data pipelines and internal linking architectures that took sites from nothing to dominant positions in competitive markets. Programmatic SEO is one of the most powerful tools in the SEO toolkit. But a chainsaw is also one of the most powerful tools in a woodworker's shop, and there is a reason you don't use one to build a jewelry box.
The sites that do programmatic SEO well - and I mean really well, the kind of well that survives algorithm updates and market shifts and the slow grinding evolution of search - share three characteristics that are so fundamental they should be tattooed on the forehead of anyone considering a programmatic approach. (I suggested this to a client once. He did not find it funny. His programmatic SEO project had just been deindexed.)
First: unique data per page. Every single page must contain information that exists on that page and only that page. Not rephrased information. Not the same paragraph with a different city name. Actual, genuinely different data. Zillow's property pages work because each page has a unique property with a unique address, unique photos, unique price history, unique tax records, unique neighborhood data. You cannot visit the Zillow page for 123 Main Street and find it identical to the page for 456 Oak Avenue. The data is different because the houses are different because the reality the data describes is different.
Second: genuine search demand per variant. There must be real people searching for each specific variant your pages target. "Zapier integrations with Slack" is a real search because real people want to connect Zapier to Slack and need to know how. "Zapier integrations with ObscureTool9000" might also be a real search, but the demand might be three people per month, and at some point you need to make a judgment about whether a page serving three people per month justifies its existence in your site architecture. (The answer is sometimes yes - if the page is genuinely useful to those three people and costs you nothing to maintain, go ahead. But you should make the decision consciously, not by default.)
Third: each page must answer a question that no other page on your site answers. This is the one that kills most programmatic SEO projects. If your "Plumber in Austin" page and your "Plumber in Dallas" page give the same advice, link to the same resources, and differ only in the city name in the headline, they do not answer different questions. They answer the same question - "how do I find a plumber?" - with a geographical label slapped on. The user who lands on either page gets the same information. Google knows this. Google has been dealing with city-swap pages since approximately 2004, and it is not fooled.
The Hall Of Fame (People Who Got It Right)
I keep a mental list of programmatic SEO implementations that I admire, and it is a shorter list than you might expect given how many sites attempt it. The ones that work share a pattern: they started with the data, not the template. They had something real to say on each page before they ever thought about scaling it.
Zapier's integration pages are the example everyone cites, and they cite it because it's genuinely excellent. Each integration page - "Connect Gmail to Slack," "Connect Trello to Google Sheets," whatever - describes a specific pairing, lists the specific triggers and actions available for that pairing, shows example workflows unique to that pairing, and provides setup instructions for that specific combination. The data is different on every page because the integrations are different. The search demand exists because people actually search for these specific combinations. And each page answers a question no other page answers: "Can I connect these two specific tools, and if so, how?"
Zillow is the property-level example I mentioned earlier. Each property page is a genuinely unique document with unique data: photos, price estimates, tax history, neighborhood statistics, school ratings, comparable sales. You could spend fifteen minutes on a Zillow property page and learn things you could not learn anywhere else. That is the standard.
NerdWallet's comparison pages deserve mention because they show how editorial judgment can coexist with programmatic generation. Their "[Credit Card A] vs [Credit Card B]" pages are not just data dumps - they synthesize the differences into actual recommendations, weigh the tradeoffs for different user profiles, and provide context that a human researcher would find useful. The programmatic framework generates the page structure and pulls in the data, but there is genuine editorial value layered on top. This is harder to do and more expensive to maintain, and that is precisely why it works.
Tripadvisor's location pages. Yelp's business pages. Indeed's job listings. These are all, at their core, programmatic SEO. They generate pages from databases. But the databases contain real information about real things, and the templates are designed to present that information in ways that are genuinely useful to the people who search for them. Nobody lands on a Tripadvisor hotel page and thinks "this is the same as every other hotel page with a different name swapped in." The data is different because the hotels are different. That's the whole point.
The Graveyard (People Who Didn't)
For every Zillow there are a thousand sites that tried the same approach and failed, and the failures are instructive in a way the successes are not, because the failures all fail for the same reason. They lack unique data per page.
My plumber client is the classic case, but I have seen it in every industry. Real estate agents generating pages for every neighborhood with identical content except the neighborhood name. Law firms generating pages for every practice area in every city they theoretically serve. E-commerce sites generating pages for every possible filter combination, so you end up with "blue men's running shoes size 10.5 wide" as a page that contains the same three products as "men's running shoes blue wide size 10.5." (I once audited a site that had generated four million filter combination pages. Google had indexed about six thousand of them. The rest were a monument to wasted crawl budget.)
The common thread is always the same: someone looked at a site that was doing programmatic SEO successfully and thought "I can do that," without understanding that the successful site's advantage was not the technique. It was the data. Zillow works not because it has a clever template. Zillow works because it has comprehensive, unique data for nearly every residential property in the United States. If you do not have the equivalent of that data for your domain, you do not have a programmatic SEO play. You have a template.
"The pages have to earn their right to exist. Every single one. Not as a group. Not as a category. Each page, individually, has to offer something a user cannot get from the page next to it."
The Decision Framework
After twenty-some years of building and auditing programmatic SEO systems, I have boiled the "should we do this?" question down to five criteria. All five must be met. Not four. Not three and a half with an asterisk. All five. I have seen people try to shortcut this framework and I have never seen it end well.
One: Do you have unique data per page? Not unique keywords. Not unique titles. Unique data. Information that is factually different on page A than on page B. If your data source is a list of city names and everything else is the same, you do not have unique data. If your data source is a database of properties with addresses, prices, photos, and tax records, you do. The distinction is not subtle.
Two: Is there real search volume per variant? Not estimated. Not assumed. Verified. Pull keyword data for a representative sample of your planned pages - not just the head terms, not just the ones you hope will work, but a random sample across the full range. If a significant percentage of your variants have zero or near-zero search volume, you are generating pages nobody is looking for. This is not a growth strategy. This is a crawl budget donation to Google.
Three: Can each page answer a question that no other page on your site answers? I said this already but I will say it again because it is the criterion most people fail. The question cannot be "what is [service] in [city]?" if the answer to that question is the same regardless of which city you plug in. The question must be one whose answer genuinely changes with the variable.
Four: Can you maintain it? Programmatic SEO is not a "set it and forget it" strategy, despite what some people will tell you (those people are selling you something). Data goes stale. APIs change. Market conditions shift. A page that was accurate six months ago might be dangerously wrong today. If you generate ten thousand pages, you are responsible for ten thousand pages. Can you monitor them? Can you update the data? Can you detect and fix errors at scale? If the answer is no, you are building a maintenance nightmare that will eventually degrade and take your site's reputation with it.
Five: Would you be comfortable showing any random page to Google's search quality team? This is the gut check. Pick a page at random from your programmatic set - not the best one, not the one you hand-crafted as the showcase, a truly random page. Show it to someone who does not work at your company. Ask them: "Is this page useful? Does it contain information you could not easily find on the page next to it?" If the answer is yes, proceed. If the answer is "well, sort of, if you squint," do not proceed.
The Technical Requirements (Where Most Implementations Actually Break)
Let's say you've passed all five criteria. Your data is unique. Your demand is real. Your pages are genuinely differentiated. Congratulations. Now comes the part where most programmatic SEO implementations actually fail, because having good content at scale is necessary but not sufficient. You also need the technical architecture to support it, and the technical requirements of programmatic SEO are fundamentally different from the requirements of a normal content site.
Canonical strategy. When you have thousands or millions of pages, many of which target similar (but not identical) queries, your canonical tag implementation is not optional and it is not simple. Every page needs a self-referencing canonical. Every page needs to be clear about what it is and what it is not a duplicate of. If you have parameter variations, pagination, or filter combinations that create near-duplicate URLs, you need a canonical strategy that resolves those overlaps explicitly. I have seen programmatic sites with hundreds of thousands of pages and no canonical tags, and the indexation rate was exactly what you'd expect: abysmal. Google was spending all its crawl budget trying to figure out which of your twelve versions of the same page was the "real" one, and eventually it gave up and picked none.
Internal linking at scale. A programmatic site with fifty thousand pages and no internal linking structure is a forest with no paths. Google cannot navigate it. Users cannot navigate it. The pages exist in isolation, each one a dead end, and dead-end pages do not rank because they do not accumulate authority and they do not communicate relevance to the rest of the site. You need hub pages that aggregate and link to your programmatic pages by category, by geography, by whatever taxonomy makes sense for your data. You need cross-links between related pages - the Austin plumber page should link to the San Antonio plumber page, and both should link to the Texas hub page, and the Texas hub page should link to the national directory. This linking structure is not a nice-to-have. It is the architecture that allows Google to discover, understand, and value your pages.
And you cannot build this linking structure manually. Not at scale. You need programmatic internal linking, which means you need your template system to understand relationships between pages and generate contextually relevant links automatically. This is a genuine engineering challenge, and if your "programmatic SEO tool" is a spreadsheet that generates HTML files with no linking logic, you have not built a programmatic SEO system. You have built a page printer.
Crawl budget management. Google allocates a finite crawl budget to your site, which is the number of pages Googlebot will crawl in a given time period. If you have fifty thousand pages and most of them are thin or duplicate, Google is burning crawl budget on pages that will never rank, which means it has less budget available for your pages that could rank. You need to be strategic about what you allow Google to crawl. Use robots.txt to block parameter combinations and faceted navigation patterns that generate near-duplicates. Use noindex directives on pages that serve users but don't need to rank in search. Use your XML sitemaps to signal which pages are most important. And monitor your crawl stats in Search Console obsessively, because a sudden drop in crawl rate is often the first sign that Google has decided your site is not worth the effort.
Index management. This is the dark art of programmatic SEO, and it's where the real skill lives. Having pages crawled is not the same as having them indexed, and having them indexed is not the same as having them rank. You need to monitor your index coverage in Search Console, track which of your programmatic pages are actually making it into the index, identify patterns in the pages that are being excluded (are they all from a specific template? A specific data category? A specific URL pattern?), and iterate on both the content and the technical signals to improve your indexation rate over time. This is not a one-time task. It is ongoing, and it requires the kind of obsessive attention to data that most people associate with quantitative trading, not content marketing.
The indexation rate of a healthy programmatic SEO site should be above 80%. If you are below 50%, something is fundamentally wrong - either with your content quality, your technical implementation, or both. If you are below 20%, as my plumber client was at 0.086%, you should stop generating new pages and start figuring out why Google is rejecting the ones you have.
A Better Way To Think About It
The metaphor I keep coming back to is the chainsaw and the scalpel. A chainsaw is the right tool when you need to cut through a lot of wood quickly and the cuts don't need to be precise. If you're clearing a forest, use a chainsaw. But if you're performing surgery, if you're trying to do something delicate and specific and the stakes are high and precision matters, the chainsaw will kill the patient. Most businesses don't need to clear a forest. Most businesses need to make a few precise cuts in exactly the right places.
I have had this conversation with probably a hundred clients over the past decade. Someone comes to me and says "we want to do programmatic SEO" and I ask them why and they say "because [competitor] did it and they're getting tons of traffic" and I ask them what unique data they have for each page and there is a long silence. That silence is the answer. If you cannot immediately articulate what unique value each page will provide, you are not ready for programmatic SEO. You might be ready for a good content strategy, which could involve creating fifty well-researched, genuinely valuable pages instead of fifty thousand thin ones. Fifty pages that rank is infinitely better than fifty thousand pages that don't.
The irony (and I do appreciate irony, even when it is expensive for the client) is that the businesses best positioned for programmatic SEO are often the ones that don't think of it in those terms. Zillow didn't set out to "do programmatic SEO." They set out to build the most comprehensive property database in America and make it accessible on the web. The SEO was a consequence of having genuinely valuable, unique data at scale. Zapier didn't set out to generate integration pages for SEO purposes. They set out to document their product's capabilities, and the fact that their product has thousands of integration pairs meant the documentation naturally became a programmatic SEO play.
The best programmatic SEO happens when you start with the data and the user need, not with the keyword list and the template. Start with: "What do we know that nobody else knows, and who is searching for it?" If the answer is rich and specific and scales across thousands of variants, you have a programmatic SEO opportunity. If the answer is "we know what everyone else knows but we want to rank for a bunch of city keywords," you don't.
What I Told My Client
I told him to delete forty-nine thousand of his pages. He did not like hearing this. Nobody likes hearing that the thing they built over a weekend needs to be demolished on a Monday. But I walked him through the logic: fifty thousand thin pages were actively harming his site by diluting crawl budget, accumulating quality signals that told Google his site was not trustworthy, and creating a technical debt that would compound over time as those pages sat there, uncrawled and unindexed, like empty storefronts in a ghost town.
We kept a thousand pages - the ones for cities where he actually had customers, where he could provide genuinely local information: real service area details, real response times, real pricing ranges, real customer reviews from that specific market. For those thousand pages, we rebuilt the template from scratch. Each page got local content that could not have been generated by swapping in a city name. Each page got local business schema with actual NAP data. Each page got internal links to related service pages and to the relevant regional hub. Each page, individually, earned its right to exist.
Six months later, Google had indexed nine hundred and fourteen of his thousand pages. His organic traffic from local service queries had increased by three hundred and forty percent compared to the period when he had fifty thousand pages live. Fewer pages. More traffic. This is not a paradox. It is arithmetic.
The chainsaw is in the toolbox for a reason. Sometimes you need it. But before you pick it up, make sure you know whether you're clearing a forest or operating on a patient. The answer determines everything.