The Log File Analysis Shortcut
- → Log files show what Google actually crawls (not what you assume)
- → Focus on Googlebot requests, ignore other bots
- → Find orphan pages, crawl waste, and frequency patterns
- → Tools: Screaming Frog Log Analyzer, or simple grep commands
Search Console tells you what Google thinks about your pages. Server logs tell you what Google actually does. The map is not the territory.
The difference matters. Logs show you crawl behavior that no tool can replicate. Which pages get crawled daily? Which get ignored for months? Where does the crawler waste budget?
Getting the Logs
Ask your hosting provider or dev team for access logs. You want at least 30 days of data.
The format varies, but you need: timestamp, URL requested, user agent, status code.
Filter to only Googlebot requests. The user agent contains "Googlebot" for web crawling. Ignore Googlebot-Image, Googlebot-Video unless those matter for your site.
The 4 Questions That Matter
1. What's getting crawled that shouldn't be?
Find URLs that Googlebot hits frequently but shouldn't index. Common culprits:
- Parameter URLs (sort orders, filters, tracking)
- Internal search results
- Paginated archives (page 47, page 48...)
- Old URLs that should 301 redirect
Every request to a junk URL is crawl budget not spent on important pages.
2. What's NOT getting crawled that should?
Compare your important URLs against what Googlebot actually requests. Any critical pages with zero crawls in 30 days? That's a problem.
Usually means: poor internal linking, orphan pages, or the page is too deep in your architecture.
3. How often do important pages get crawled?
Your homepage might get crawled 100x per day. Your most important product pages should get crawled at least weekly.
If key pages only get crawled monthly, Google isn't treating them as important. Fix the internal linking. Add them to your sitemap. Build more authority to those pages.
4. What status codes is Googlebot seeing?
Filter by response code. Look for:
- 5xx errors: Server problems Google is hitting
- 404s: Broken pages Googlebot is finding
- 301/302s: Redirect chains being crawled
- Soft 404s: Pages returning 200 but with no content
The Quick Analysis Method
You don't need fancy tools. Command line works fine.
Filter to Googlebot. Group by URL. Count requests. Sort by frequency.
The top 100 most-crawled URLs tell you what Google thinks is important on your site. If that list doesn't match what YOU think is important, you have work to do.
The bottom of the list (URLs crawled once or never) shows what Google is ignoring. If important pages are there, fix your architecture.
The Crawl Budget Reality
Small sites (under 10K pages) rarely have crawl budget problems. Google will get to everything eventually.
Large sites absolutely have crawl budget issues. When you have 500K URLs and Googlebot crawls 50K per month, you need to prioritize.
Log analysis tells you exactly where to focus. Block the junk. Promote the good stuff. Make every crawl count.
Most SEOs never look at server logs. The ones who do find problems nobody else can see. Be the one who looks. Once you identify crawl waste, clean up your redirects to maximize crawl efficiency.