Claude-blog blog-cannibalization
git clone https://github.com/AgriciDaniel/claude-blog
T=$(mktemp -d) && git clone --depth=1 https://github.com/AgriciDaniel/claude-blog "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/blog-cannibalization" ~/.claude/skills/agricidaniel-claude-blog-blog-cannibalization && rm -rf "$T"
skills/blog-cannibalization/SKILL.mdBlog Cannibalization - Keyword Overlap Detection
Detect when multiple blog posts compete for the same search keywords. Two modes: local-only analysis (default) and DataForSEO API mode for SERP-level data.
Two Modes
| Mode | Flag | Cost | Data Source |
|---|---|---|---|
| Local | (default) | Free | File content analysis via Grep/Read |
| API | | ~$0.01/call | DataForSEO Page Intersection + Ranked Keywords |
Local mode works without any API keys. API mode requires DataForSEO credentials set as environment variables:
DATAFORSEO_LOGIN and DATAFORSEO_PASSWORD.
Local Mode Workflow
Step 1: Scan Blog Files
Use Glob to find all content files in the target directory:
- Patterns:
,**/*.md
,**/*.mdx**/*.html - Skip files in
,node_modules/
,.git/drafts/
Step 2: Extract Primary Keywords
For each file, read and extract keyword signals from:
- Title tag or H1 heading (highest weight)
- H2 headings (medium weight)
- First paragraph (supporting signal)
- Meta description if present in frontmatter
Primary keyword extraction method:
- Tokenize title and H1 into 1-gram, 2-gram, and 3-gram phrases
- Score each phrase by frequency across title + H2s + first paragraph
- Select the top-scoring 2-3 word phrase as the primary keyword
- Record secondary keywords from H2 headings
Step 3: Cluster by Similarity
Group posts into clusters using these matching rules (in priority order):
- Exact match - identical primary keyword across 2+ posts
- Stem match - same root word (e.g., "optimize" vs "optimization")
- Semantic overlap - Claude determines that two keywords target the same search intent (e.g., "best CRM software" vs "top CRM tools 2026")
- Subset match - one keyword contains another (e.g., "email marketing" vs "email marketing for startups")
Step 4: Score and Flag
For each cluster with 2+ posts, assess severity and generate a recommendation.
Step 5: Output Report
Display the results table and per-cluster recommendations.
API Mode Workflow (DataForSEO)
Requires the
--api flag. Uses WebFetch to call DataForSEO endpoints.
Endpoints Used
Page Intersection - find keywords where multiple URLs rank:
POST https://api.dataforseo.com/v3/dataforseo_labs/google/page_intersection/live Authorization: Basic <base64(login:password)> { "pages": { "1": "https://example.com/post-a", "2": "https://example.com/post-b" }, "language_code": "en", "location_code": 2840 }
Cost: ~$0.01 per call. Returns overlapping keywords with position, volume, CPC.
Ranked Keywords - get all keywords a single URL ranks for:
POST https://api.dataforseo.com/v3/dataforseo_labs/google/ranked_keywords/live { "target": "https://example.com/post-a", "language_code": "en", "location_code": 2840 }
API Analysis Steps
- Collect all published URLs from the user (or sitemap)
- Run Ranked Keywords for each URL to build keyword profiles
- Run Page Intersection for URL pairs that share keyword clusters
- Calculate severity using the formula below
- Output enriched report with search volume and position data
Severity Scoring
Four severity levels based on overlap signals:
| Level | Criteria | Action Urgency |
|---|---|---|
| Critical | Same exact keyword, both pages in top 20 | Immediate |
| High | Same keyword cluster, one page outranks the other | This week |
| Medium | Related keywords with partial SERP overlap | This month |
| Low | Semantic similarity but different confirmed intents | Monitor |
Severity Formula (API Mode)
severity_score = overlap_count x avg_search_volume x (1 / position_gap)
Where:
= number of shared ranking keywordsoverlap_count
= mean monthly volume of shared keywordsavg_search_volume
= absolute difference in average ranking position (min 1)position_gap
Higher score = more urgent cannibalization problem.
Severity Heuristic (Local Mode)
Without SERP data, use a simplified scoring:
- Critical: Exact primary keyword match between posts
- High: Stem match on primary keyword, or 3+ shared H2 keywords
- Medium: Semantic overlap on primary keyword
- Low: Subset match only, or shared secondary keywords
Output Format
Summary Table
| Post A | Post B | Shared Keywords | Severity | Recommendation | |--------|--------|-----------------|----------|----------------| | /best-crm-tools | /top-crm-software | best crm, crm tools, crm software | Critical | MERGE | | /email-tips | /email-marketing-guide | email marketing | High | DIFFERENTIATE | | /seo-basics | /seo-for-beginners | seo basics, beginner seo | Critical | CANONICAL | | /react-hooks | /react-state-mgmt | react, state | Low | NO ACTION |
Per-Cluster Detail
For each flagged cluster, provide:
- Both post titles and URLs
- Full list of overlapping keywords (with volume if API mode)
- Which post is stronger (more comprehensive, better structured)
- Specific recommendation with rationale
Recommendations
Four possible actions for each cannibalization cluster:
MERGE
When both pages are thin or cover the same intent with similar depth.
- Combine the best content from both into one comprehensive post
- 301 redirect the weaker URL to the merged post
- Preserve all internal links pointing to either URL
DIFFERENTIATE
When pages serve different intents but keyword targeting overlaps.
- Shift the primary keyword of the weaker post to a related long-tail
- Update the title, H1, and meta description to reflect the new focus
- Add internal links between the two posts to signal distinct topics
CANONICAL
When one post is clearly the authority and the other is a lesser duplicate.
- Add
on the weaker page pointing to the authorityrel="canonical" - Consider noindexing the weaker page if it adds no unique value
- Link from the weaker page to the authority page
NO ACTION
When intent is genuinely different despite surface-level keyword similarity.
- Document the reasoning for future audits
- Monitor rankings quarterly for any position changes
- Re-evaluate if either post drops in rankings
Error Handling
- No blog files found: If the directory contains no .md, .mdx, or .html files, report "No blog files found in [directory]" and suggest checking the path
- DataForSEO credentials missing: In API mode, if credentials are not configured, fall back to local mode automatically and notify the user
- API rate limits: DataForSEO has per-minute rate limits. If a 429 response is received, wait and retry once. If it persists, switch to local mode for remaining URLs
- WebFetch failures: If a source URL is unreachable, skip it and note "Unable to verify - source unavailable" in the report
- Single-post directory: If only one blog post exists, report "Cannibalization analysis requires at least 2 posts" and exit gracefully