Cc-skills code-clone-assistant
Detect and refactor code duplication with PMD CPD. TRIGGERS - code clones, DRY violations, duplicate code.
git clone https://github.com/terrylica/cc-skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/terrylica/cc-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/quality-tools/skills/code-clone-assistant" ~/.claude/skills/terrylica-cc-skills-code-clone-assistant && rm -rf "$T"
plugins/quality-tools/skills/code-clone-assistant/SKILL.mdCode Clone Assistant
Detect code clones and guide refactoring using PMD CPD (exact duplicates) + Semgrep (patterns).
Self-Evolving Skill: This skill improves through use. If instructions are wrong, parameters drifted, or a workaround was needed — fix this file immediately, don't defer. Only update for real, reproducible issues.
Tools
- PMD CPD v7.17.0+: Exact duplicate detection
- Semgrep v1.140.0+: Pattern-based detection
Tested: October 2025 - 30 violations detected across 3 sample files Coverage: ~3x more violations than using either tool alone
When to Use This Skill
Use this skill when:
- Finding duplicate code in a codebase
- Detecting DRY violations
- Refactoring similar code patterns
- Identifying copy-paste code
Why Two Tools?
PMD CPD and Semgrep detect different clone types:
| Aspect | PMD CPD | Semgrep |
|---|---|---|
| Detects | Exact copy-paste duplicates | Similar patterns with variations |
| Scope | Across files ✅ | Within/across files (Pro only) |
| Matching | Token-based (ignores formatting) | Pattern-based (AST matching) |
| Rules | ❌ No custom rules | ✅ Custom rules |
Result: Using both finds ~3x more DRY violations.
Clone Types
| Type | Description | PMD CPD | Semgrep |
|---|---|---|---|
| Type-1 | Exact copies | ✅ Default | ✅ |
| Type-2 | Renamed identifiers | ✅ | ✅ |
| Type-3 | Near-miss with variations | ⚠️ Partial | ✅ Patterns |
| Type-4 | Semantic clones (same behavior) | ❌ | ❌ |
Quick Start Workflow
# Step 1: Detect exact duplicates (PMD CPD) pmd cpd -d . -l python --minimum-tokens 20 -f markdown > pmd-results.md # Step 2: Detect pattern violations (Semgrep) semgrep --config=clone-rules.yaml --sarif --quiet > semgrep-results.sarif # Step 3: Analyze combined results (Claude Code) # Parse both outputs, prioritize by severity # Step 4: Refactor (Claude Code with user approval) # Extract shared functions, consolidate patterns, verify tests
Accepted Exceptions (Known Intentional Duplication)
Not all code duplication is a problem. Some codebases deliberately use copy-and-adapt patterns where refactoring would be harmful. When running clone detection, always check for accepted exceptions before recommending refactoring.
When Duplication Is Acceptable
| Pattern | Why Acceptable | Example |
|---|---|---|
| Generation-per-directory experiments | Each generation is an immutable, self-contained experiment. Sharing code across generations would break provenance and make past experiments non-reproducible. | SQL templates, sweep scripts where each is independent |
| SQL templates with placeholder substitution | SQL has no import/include mechanism. Templates use placeholder replacement (), not function calls. Extracting shared CTEs into separate files would break the single-file execution model. | ClickHouse sweep templates sharing signal detection + metrics CTEs |
| Protocol/schema boilerplate | Serialization formats, API contracts, and wire protocols require exact structure in each location. Abstracting them hides the contract. | NDJSON telemetry line construction in wrapper scripts |
| Test fixtures and golden files | Test data intentionally duplicates production patterns to verify behavior. Sharing fixtures creates brittle cross-test dependencies. | Test setup code, expected output snapshots |
How to Report Accepted Exceptions
When clone detection finds duplication that matches an accepted exception pattern:
- Report it — always show the user what was found (lines, tokens, files)
- Flag as accepted — explicitly state it matches a known exception pattern
- Explain why — cite the specific reason refactoring is not recommended
- Do NOT recommend refactoring — this is the key difference from actionable findings
Example output format:
Code Clone Analysis Results PMD CPD Findings: Clone 1: 115 lines (575 tokens) — base_bars → signals CTEs gen610_template.sql:33 ↔ gen710_template.sql:38 Status: ACCEPTED EXCEPTION (generation-per-directory experiment) Reason: Each generation is immutable. Shared CTEs would break experiment provenance and reproducibility. Clone 2: 36 lines (478 tokens) — metrics aggregation gen610_template.sql:207 ↔ gen710_template.sql:244 Status: ACCEPTED EXCEPTION (SQL template without include mechanism) Actionable Findings: 0 Accepted Exceptions: 2
Project-Level Exception Configuration
Projects can declare accepted exception patterns in their
CLAUDE.md:
## Code Clone Exceptions - `sql/gen*_template.sql` — generation-per-directory experiments (immutable) - `scripts/gen*/` — copy-and-adapt sweep scripts (no shared infrastructure) - `tests/fixtures/` — intentional duplication for test isolation
When this section exists in a project's
CLAUDE.md, the code-clone-assistant should check it before classifying findings.
Reference Documentation
For detailed information, see:
- Detection Commands - PMD CPD and Semgrep command details
- Complete Workflow - Detection, analysis, and presentation phases
- Refactoring Strategies - Approaches for addressing violations
Troubleshooting
| Issue | Cause | Solution |
|---|---|---|
| PMD CPD not found | Not installed or not in PATH | or download from PMD releases |
| Semgrep timeout | Large codebase scan | Use to limit scope |
| No duplicates detected | minimum-tokens too high | Lower value (try 15) |
| Too many false positives | minimum-tokens too low | Increase (try 30+) |
| Language not recognized | Wrong flag | Check PMD CPD supported languages list |
| SARIF parse error | Semgrep output malformed | Upgrade Semgrep to latest version |
| Memory error on large repo | Java heap too small | Set |
| Missing clone rules file | Custom rules not created | Create or use default config |
Post-Execution Reflection
After this skill completes, check before closing:
- Did the command succeed? — If not, fix the instruction or error table that caused the failure.
- Did parameters or output change? — If the underlying tool's interface drifted, update Usage examples and Parameters table to match.
- Was a workaround needed? — If you had to improvise (different flags, extra steps), update this SKILL.md so the next invocation doesn't need the same workaround.
Only update if the issue is real and reproducible — not speculative.