Hone-skills hone:duplication-hunt
git clone https://github.com/ckorhonen/hone-skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/ckorhonen/hone-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/duplication-hunt" ~/.claude/skills/ckorhonen-hone-skills-hone-duplication-hunt && rm -rf "$T"
skills/duplication-hunt/SKILL.mdDuplication Hunt
What This Skill Does
Scans source code to find duplicated patterns that indicate extraction opportunities. Goes beyond simple copy-paste detection to find three types of duplication:
- Exact duplicates: Identical code blocks (3+ lines) appearing in two or more locations.
- Renamed duplicates: Structurally identical code where only variable names, string literals, or numeric constants differ.
- Structural duplicates: Code blocks that follow the same control flow pattern (same sequence of operations, branches, and loops) with different specifics — the "same shape, different nouns" pattern.
Ranks findings by occurrence count and block size to surface the highest-value extraction candidates first.
When To Use
- On a weekly schedule as a codebase hygiene check.
- After a large feature merges to catch introduced duplication.
- When onboarding to a codebase to understand where abstractions are missing.
- When the user asks to "find duplicated code" or "hunt for duplication".
Do Not Use
- For method length — use
instead.hone:method-brevity-audit - For naming quality — use
instead.hone:intent-clarity-audit - For test naming — use
instead.hone:test-naming-audit - For design-level duplication (repeated architectural patterns across services). This skill operates at the code block level.
- To auto-extract or refactor duplicates. This skill reports findings only.
Inputs To Confirm
- Scope: Which directories or file patterns to scan (default: entire repo, excluding vendored/generated code).
- Minimum block size: Smallest code block to consider, in lines (default: 4 lines).
- Minimum occurrences: How many times a pattern must appear to be reported (default: 2 for exact, 3 for structural).
- Exclusions: Glob patterns for files or directories to skip.
- Top-N: Maximum findings to report (default: 15).
Instructions
-
Identify scannable files. Walk the repository tree. Exclude vendored directories (
,node_modules
,vendor
,dist
,build
,.git
), generated files, lock files, and user-specified exclusions. Include test files in the scan — test duplication is also worth finding.__pycache__ -
Normalize source code. For each file, produce a normalized form by:
- Removing comments and blank lines.
- Collapsing whitespace and indentation differences.
- Preserving statement structure and control flow keywords. This normalized form is used for comparison; original code is used for reporting.
-
Detect exact duplicates. Slide a window of
to 50 lines across each normalized file. Hash each window. Group windows by hash. When the same hash appears in 2+ locations (across files or within the same file), record an exact duplicate finding. Merge overlapping windows into the largest contiguous block.minimum_block_size -
Detect renamed duplicates. For blocks that are not exact matches, replace all identifiers with a placeholder token and all literals with type placeholders (
,<STR>
,<NUM>
). Re-hash. Group by this structural hash. Blocks that share a structural hash but differ in the original are renamed duplicates.<BOOL> -
Detect structural duplicates. Reduce each block to its control flow skeleton: the sequence of keywords (
,if
,else
,for
,while
,return
,try
,catch
,switch
,case
, function calls asmatch
) and their nesting structure. Hash the skeleton. Group blocks with matching skeletons that span at least 6 lines. This catches the "same logic, different details" pattern.CALL -
Score and rank. For each finding, compute a value score:
wherescore = occurrences * block_lines * type_weight
is 3.0 for exact, 2.0 for renamed, 1.0 for structural. Sort by score descending.type_weight -
Suggest extraction targets. For the top findings, note:
- What a shared function/method might look like (parameters needed).
- Which files would benefit from the extraction.
- Whether the duplication is in production code, test code, or both.
-
Produce the report per Output Requirements.
Output Requirements
Produce a Markdown report:
# Duplication Hunt **Repo**: <repo name> **Scanned**: <N> files | **Duplicate groups found**: <count> ## Findings ### 1. <Brief description of the duplicated pattern> - **Type**: Exact / Renamed / Structural - **Occurrences**: N locations - **Block size**: M lines - **Score**: <value> **Locations**: | File | Lines | Preview | |------|-------|---------| | src/auth/login.ts | 24-38 | `const token = await fetch(...)` ... | | src/auth/signup.ts | 31-45 | `const token = await fetch(...)` ... | **Extraction suggestion**: Extract to a shared `authenticateUser(credentials)` function in `src/auth/shared.ts`. --- ### 2. ... ## Summary - **By type**: 5 exact, 3 renamed, 2 structural - **Total duplicated lines**: ~320 lines across 10 groups - **Highest-value extractions**: Group 1 (saves ~45 lines), Group 3 (saves ~30 lines) - **Principle**: "Twice is a smell, three times is a pattern" — groups with 3+ occurrences are strong extraction candidates
Every finding must reference real file paths and line ranges. Previews must be actual code snippets, not fabricated examples.
Quality Bar
- Every reported duplicate must be verifiable at the stated file:line ranges.
- Exact duplicates must be genuinely identical (modulo whitespace).
- Renamed duplicates must have the same structure when identifiers are replaced.
- Do not flag boilerplate that is intentionally repeated (e.g., license headers, import blocks of 3 lines or fewer, trivial getters/setters).
- Do not flag configuration files, data fixtures, or migration files.
- Extraction suggestions must be concrete (name the function, list parameters) and plausible (not every duplicate merits extraction — note when the coupling cost may outweigh the deduplication benefit).
- If no duplication is found above the threshold, state that explicitly.