Claude-seo-skills seo-log-analysis
install
source · Clone the upstream repo
git clone https://github.com/lionkiii/claude-seo-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/lionkiii/claude-seo-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/seo-log-analysis" ~/.claude/skills/lionkiii-claude-seo-skills-seo-log-analysis && rm -rf "$T"
manifest:
skills/seo-log-analysis/SKILL.mdsource content
Server Log Analysis
Analyzes local server log files for crawl budget breakdown. No MCP or external calls required.
Inputs
: Absolute path to server log file (Apache Combined, Apache Common, or Nginx access log). If user provides relative path, resolve withfile
.Bash: realpath <path>
Execution
Step 1: Format Detection
Read the first 10 lines of the log file to detect format:
- Apache Combined:
— 9+ fields, has referer and UA in quotes%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-agent}i" - Apache Common:
— 7 fields, no referer/UA%h %l %u %t "%r" %>s %b - Nginx: similar to Apache Combined with slight field order differences
- Check for compressed files (.gz) — if detected, inform user to decompress first
Step 2: Parse Log Lines
Use Bash awk to extract fields. For Apache Combined/Nginx format (9 fields):
awk '{ ip=$1; method_url=$7; status=$9; ua=$0 match($0, /"([^"]+)"$/, arr) # Extract UA from last quoted field print ip, $7, $9, arr[1] }' logfile
For Apache Common (7 fields): ip=$1, request=$7, status=$9, ua="unknown"
Step 3: Classify User-Agents
Group each request into categories:
- Googlebot:
,Googlebot
,Googlebot-Image
,Googlebot-NewsAdsBot-Google - Bingbot:
,bingbot
,BingPreviewMicrosoftPreview - Other search bots:
(Yahoo),Slurp
,DuckDuckBot
,Baiduspider
,YandexBotSogou - AI crawlers:
,GPTBot
,ClaudeBot
,PerplexityBot
,Bytespider
,CCBotanthropic-ai - Monitoring tools:
,Pingdom
,UptimeRobot
,StatusCake
,NewRelicDatadog - Real users: everything else (browsers:
,Mozilla
,Chrome
,Safari
,Firefox
)Edge - Unknown: no UA or unrecognized
Step 4: Calculate Metrics
Using awk/grep on the log file:
- Total request count
- Requests by bot category (count per category, % of total)
- Requests by HTTP status code (200, 301, 302, 404, 500, etc.)
- Top 20 crawled URLs by frequency — sort by count descending
- Top 10 crawled path prefixes (first 2 URL segments, e.g.,
,/blog/
) — aggregate by prefix/products/ - Requests by hour-of-day (extract hour from timestamp field
)[DD/Mon/YYYY:HH:MM:SS]
Step 5: Identify Crawl Budget Concerns
Flag these patterns:
- 4xx error rate >5%: crawlers wasting budget on broken URLs
- 5xx error rate >1%: server errors burning crawl budget
- Duplicate crawl patterns: same URL crawled >10x without apparent content change
- Low-value paths: bots crawling
,/wp-admin
,/search?
,?sort=
, session URLs?page= - 302 redirect overuse: temporary redirects don't pass full crawl equity
- Non-canonical crawls:
or tracking parameters in crawled URLs?utm_
Output Format
## Server Log Analysis: [filename] **File:** [path] | **Format:** [Apache Combined/Common/Nginx] | **Total Requests:** [N] ### Crawl Budget Summary | Metric | Value | |--------|-------| | Total requests | N | | Bot traffic | N (X%) | | Human traffic | N (X%) | | Crawl error rate | X% (4xx+5xx) | | Date range | [first log entry] to [last log entry] | ### Bot Traffic Breakdown | Bot Category | Requests | % of Total | Top URL | |---|---|---|---| | Googlebot | N | X% | /path | | Bingbot | N | X% | /path | | AI Crawlers | N | X% | /path | | Monitoring | N | X% | /path | | Real Users | N | X% | — | | Other/Unknown | N | X% | — | ### Top 20 Crawled URLs | Rank | URL | Requests | Status Codes | |------|-----|----------|--------------| | 1 | /path | N | 200: N, 404: N | ### Crawl Frequency by Path | Path Prefix | Requests | % of Bot Traffic | |---|---|---| | /blog/ | N | X% | ### Status Code Distribution | Status | Count | % | Interpretation | |--------|-------|---|----------------| | 200 | N | X% | OK | | 301 | N | X% | Permanent redirect | | 404 | N | X% | Not found (crawl waste) | ### Crawl Budget Recommendations [Prioritized list of issues found — Critical/High/Medium/Low] ## Data Sources - Source: Local server log file (no external calls)