Cc-skills code-clone-assistant

Detect and refactor code duplication with PMD CPD. TRIGGERS - code clones, DRY violations, duplicate code.

install

source · Clone the upstream repo

git clone https://github.com/terrylica/cc-skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/terrylica/cc-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/quality-tools/skills/code-clone-assistant" ~/.claude/skills/terrylica-cc-skills-code-clone-assistant && rm -rf "$T"

manifest: plugins/quality-tools/skills/code-clone-assistant/SKILL.md

source content

Code Clone Assistant

Detect code clones and guide refactoring using PMD CPD (exact duplicates) + Semgrep (patterns).

Self-Evolving Skill: This skill improves through use. If instructions are wrong, parameters drifted, or a workaround was needed — fix this file immediately, don't defer. Only update for real, reproducible issues.

Tools

PMD CPD v7.17.0+: Exact duplicate detection
Semgrep v1.140.0+: Pattern-based detection

Tested: October 2025 - 30 violations detected across 3 sample files Coverage: ~3x more violations than using either tool alone

When to Use This Skill

Use this skill when:

Finding duplicate code in a codebase
Detecting DRY violations
Refactoring similar code patterns
Identifying copy-paste code

Why Two Tools?

PMD CPD and Semgrep detect different clone types:

Aspect	PMD CPD	Semgrep
Detects	Exact copy-paste duplicates	Similar patterns with variations
Scope	Across files ✅	Within/across files (Pro only)
Matching	Token-based (ignores formatting)	Pattern-based (AST matching)
Rules	❌ No custom rules	✅ Custom rules

Result: Using both finds ~3x more DRY violations.

Clone Types

Type	Description	PMD CPD	Semgrep
Type-1	Exact copies	✅ Default	✅
Type-2	Renamed identifiers	✅ `--ignore-*`	✅
Type-3	Near-miss with variations	⚠️ Partial	✅ Patterns
Type-4	Semantic clones (same behavior)	❌	❌

Quick Start Workflow

# Step 1: Detect exact duplicates (PMD CPD)
pmd cpd -d . -l python --minimum-tokens 20 -f markdown > pmd-results.md

# Step 2: Detect pattern violations (Semgrep)
semgrep --config=clone-rules.yaml --sarif --quiet > semgrep-results.sarif

# Step 3: Analyze combined results (Claude Code)
# Parse both outputs, prioritize by severity

# Step 4: Refactor (Claude Code with user approval)
# Extract shared functions, consolidate patterns, verify tests

Accepted Exceptions (Known Intentional Duplication)

Not all code duplication is a problem. Some codebases deliberately use copy-and-adapt patterns where refactoring would be harmful. When running clone detection, always check for accepted exceptions before recommending refactoring.

When Duplication Is Acceptable

Pattern	Why Acceptable	Example
Generation-per-directory experiments	Each generation is an immutable, self-contained experiment. Sharing code across generations would break provenance and make past experiments non-reproducible.	SQL templates, sweep scripts where each `gen{NNN}/` is independent
SQL templates with placeholder substitution	SQL has no import/include mechanism. Templates use `sed` placeholder replacement ( `__PLACEHOLDER__` ), not function calls. Extracting shared CTEs into separate files would break the single-file execution model.	ClickHouse sweep templates sharing signal detection + metrics CTEs
Protocol/schema boilerplate	Serialization formats, API contracts, and wire protocols require exact structure in each location. Abstracting them hides the contract.	NDJSON telemetry line construction in wrapper scripts
Test fixtures and golden files	Test data intentionally duplicates production patterns to verify behavior. Sharing fixtures creates brittle cross-test dependencies.	Test setup code, expected output snapshots

How to Report Accepted Exceptions

When clone detection finds duplication that matches an accepted exception pattern:

Report it — always show the user what was found (lines, tokens, files)
Flag as accepted — explicitly state it matches a known exception pattern
Explain why — cite the specific reason refactoring is not recommended
Do NOT recommend refactoring — this is the key difference from actionable findings

Example output format:

Code Clone Analysis Results

PMD CPD Findings:
  Clone 1: 115 lines (575 tokens) — base_bars → signals CTEs
    gen610_template.sql:33 ↔ gen710_template.sql:38
    Status: ACCEPTED EXCEPTION (generation-per-directory experiment)
    Reason: Each generation is immutable. Shared CTEs would break
            experiment provenance and reproducibility.

  Clone 2: 36 lines (478 tokens) — metrics aggregation
    gen610_template.sql:207 ↔ gen710_template.sql:244
    Status: ACCEPTED EXCEPTION (SQL template without include mechanism)

Actionable Findings: 0
Accepted Exceptions: 2

Project-Level Exception Configuration

Projects can declare accepted exception patterns in their

CLAUDE.md

## Code Clone Exceptions

- `sql/gen*_template.sql` — generation-per-directory experiments (immutable)
- `scripts/gen*/` — copy-and-adapt sweep scripts (no shared infrastructure)
- `tests/fixtures/` — intentional duplication for test isolation

When this section exists in a project's

CLAUDE.md

, the code-clone-assistant should check it before classifying findings.

Reference Documentation

For detailed information, see:

Detection Commands - PMD CPD and Semgrep command details
Complete Workflow - Detection, analysis, and presentation phases
Refactoring Strategies - Approaches for addressing violations

Troubleshooting

Issue	Cause	Solution
PMD CPD not found	Not installed or not in PATH	`brew install pmd` or download from PMD releases
Semgrep timeout	Large codebase scan	Use `--exclude` to limit scope
No duplicates detected	minimum-tokens too high	Lower `--minimum-tokens` value (try 15)
Too many false positives	minimum-tokens too low	Increase `--minimum-tokens` (try 30+)
Language not recognized	Wrong `-l` flag	Check PMD CPD supported languages list
SARIF parse error	Semgrep output malformed	Upgrade Semgrep to latest version
Memory error on large repo	Java heap too small	Set `PMD_JAVA_OPTS=-Xmx4g`
Missing clone rules file	Custom rules not created	Create `clone-rules.yaml` or use default config

Post-Execution Reflection

After this skill completes, check before closing:

Did the command succeed? — If not, fix the instruction or error table that caused the failure.
Did parameters or output change? — If the underlying tool's interface drifted, update Usage examples and Parameters table to match.
Was a workaround needed? — If you had to improvise (different flags, extra steps), update this SKILL.md so the next invocation doesn't need the same workaround.

Only update if the issue is real and reproducible — not speculative.