Claude-seo-skills seo-log-analysis

install

source · Clone the upstream repo

git clone https://github.com/lionkiii/claude-seo-skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/lionkiii/claude-seo-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/seo-log-analysis" ~/.claude/skills/lionkiii-claude-seo-skills-seo-log-analysis && rm -rf "$T"

manifest: skills/seo-log-analysis/SKILL.md

source content

Server Log Analysis

Analyzes local server log files for crawl budget breakdown. No MCP or external calls required.

Inputs

```
file
```
: Absolute path to server log file (Apache Combined, Apache Common, or Nginx access log). If user provides relative path, resolve with
```
Bash: realpath <path>
```
.

Execution

Step 1: Format Detection

Read the first 10 lines of the log file to detect format:

Apache Combined:
```
%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-agent}i"
```
— 9+ fields, has referer and UA in quotes
Apache Common:
```
%h %l %u %t "%r" %>s %b
```
— 7 fields, no referer/UA
Nginx: similar to Apache Combined with slight field order differences
Check for compressed files (.gz) — if detected, inform user to decompress first

Step 2: Parse Log Lines

Use Bash awk to extract fields. For Apache Combined/Nginx format (9 fields):

awk '{
  ip=$1; method_url=$7; status=$9; ua=$0
  match($0, /"([^"]+)"$/, arr)  # Extract UA from last quoted field
  print ip, $7, $9, arr[1]
}' logfile

For Apache Common (7 fields): ip=$1, request=$7, status=$9, ua="unknown"

Step 3: Classify User-Agents

Group each request into categories:

Googlebot:

Googlebot

Googlebot-Image

Googlebot-News

AdsBot-Google

Bingbot:
```
bingbot
```
,
```
BingPreview
```
,
```
MicrosoftPreview
```

Other search bots:

Slurp

(Yahoo),

DuckDuckBot

Baiduspider

YandexBot

Sogou

AI crawlers:

GPTBot

ClaudeBot

PerplexityBot

Bytespider

CCBot

anthropic-ai

Monitoring tools:

Pingdom

UptimeRobot

StatusCake

NewRelic

Datadog

Real users: everything else (browsers:
```
Mozilla
```
,
```
Chrome
```
,
```
Safari
```
,
```
Firefox
```
,
```
Edge
```
)
Unknown: no UA or unrecognized

Step 4: Calculate Metrics

Using awk/grep on the log file:

Total request count
Requests by bot category (count per category, % of total)
Requests by HTTP status code (200, 301, 302, 404, 500, etc.)
Top 20 crawled URLs by frequency — sort by count descending
Top 10 crawled path prefixes (first 2 URL segments, e.g.,
```
/blog/
```
,
```
/products/
```
) — aggregate by prefix
Requests by hour-of-day (extract hour from timestamp field
```
[DD/Mon/YYYY:HH:MM:SS]
```
)

Step 5: Identify Crawl Budget Concerns

Flag these patterns:

4xx error rate >5%: crawlers wasting budget on broken URLs
5xx error rate >1%: server errors burning crawl budget
Duplicate crawl patterns: same URL crawled >10x without apparent content change
Low-value paths: bots crawling
```
/wp-admin
```
,
```
/search?
```
,
```
?sort=
```
,
```
?page=
```
, session URLs
302 redirect overuse: temporary redirects don't pass full crawl equity
Non-canonical crawls:
```
?utm_
```
or tracking parameters in crawled URLs

Output Format

## Server Log Analysis: [filename]

**File:** [path] | **Format:** [Apache Combined/Common/Nginx] | **Total Requests:** [N]

### Crawl Budget Summary

| Metric | Value |
|--------|-------|
| Total requests | N |
| Bot traffic | N (X%) |
| Human traffic | N (X%) |
| Crawl error rate | X% (4xx+5xx) |
| Date range | [first log entry] to [last log entry] |

### Bot Traffic Breakdown

| Bot Category | Requests | % of Total | Top URL |
|---|---|---|---|
| Googlebot | N | X% | /path |
| Bingbot | N | X% | /path |
| AI Crawlers | N | X% | /path |
| Monitoring | N | X% | /path |
| Real Users | N | X% | — |
| Other/Unknown | N | X% | — |

### Top 20 Crawled URLs

| Rank | URL | Requests | Status Codes |
|------|-----|----------|--------------|
| 1 | /path | N | 200: N, 404: N |

### Crawl Frequency by Path

| Path Prefix | Requests | % of Bot Traffic |
|---|---|---|
| /blog/ | N | X% |

### Status Code Distribution

| Status | Count | % | Interpretation |
|--------|-------|---|----------------|
| 200 | N | X% | OK |
| 301 | N | X% | Permanent redirect |
| 404 | N | X% | Not found (crawl waste) |

### Crawl Budget Recommendations

[Prioritized list of issues found — Critical/High/Medium/Low]

## Data Sources

- Source: Local server log file (no external calls)