Superpowers-lab finding-duplicate-functions

Use when auditing a codebase for semantic duplication - functions that do the same thing but have different names or implementations. Especially useful for LLM-generated codebases where new functions are often created rather than reusing existing ones.

install
source · Clone the upstream repo
git clone https://github.com/obra/superpowers-lab
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/obra/superpowers-lab "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/finding-duplicate-functions" ~/.claude/skills/obra-superpowers-lab-finding-duplicate-functions && rm -rf "$T"
manifest: skills/finding-duplicate-functions/SKILL.md
source content

Finding Duplicate-Intent Functions

Overview

LLM-generated codebases accumulate semantic duplicates: functions that serve the same purpose but were implemented independently. Classical copy-paste detectors (jscpd) find syntactic duplicates but miss "same intent, different implementation."

This skill uses a two-phase approach: classical extraction followed by LLM-powered intent clustering.

When to Use

  • Codebase has grown organically with multiple contributors (human or LLM)
  • You suspect utility functions have been reimplemented multiple times
  • Before major refactoring to identify consolidation opportunities
  • After jscpd has been run and syntactic duplicates are already handled

Quick Reference

PhaseToolModelOutput
1. Extract
scripts/extract-functions.sh
-
catalog.json
2. Categorize
scripts/categorize-prompt.md
haiku
categorized.json
3. Split
scripts/prepare-category-analysis.sh
-
categories/*.json
4. Detect
scripts/find-duplicates-prompt.md
opus
duplicates/*.json
5. Report
scripts/generate-report.sh
-
report.md

Process

digraph duplicate_detection {
  rankdir=TB;
  node [shape=box];

  extract [label="1. Extract function catalog\n./scripts/extract-functions.sh"];
  categorize [label="2. Categorize by domain\n(haiku subagent)"];
  split [label="3. Split into categories\n./scripts/prepare-category-analysis.sh"];
  detect [label="4. Find duplicates per category\n(opus subagent per category)"];
  report [label="5. Generate report\n./scripts/generate-report.sh"];
  review [label="6. Human review & consolidate"];

  extract -> categorize -> split -> detect -> report -> review;
}

Phase 1: Extract Function Catalog

./scripts/extract-functions.sh src/ -o catalog.json

Options:

  • -o FILE
    : Output file (default: stdout)
  • -c N
    : Lines of context to capture (default: 15)
  • -t GLOB
    : File types (default:
    *.ts,*.tsx,*.js,*.jsx
    )
  • --include-tests
    : Include test files (excluded by default)

Test files (

*.test.*
,
*.spec.*
,
__tests__/**
) are excluded by default since test utilities are less likely to be consolidation candidates.

Phase 2: Categorize by Domain

Dispatch a haiku subagent using the prompt in

scripts/categorize-prompt.md
.

Insert the contents of

catalog.json
where indicated in the prompt template. Save output as
categorized.json
.

Phase 3: Split into Categories

./scripts/prepare-category-analysis.sh categorized.json ./categories

Creates one JSON file per category. Only categories with 3+ functions are worth analyzing.

Phase 4: Find Duplicates (Per Category)

For each category file in

./categories/
, dispatch an opus subagent using the prompt in
scripts/find-duplicates-prompt.md
.

Save each output as

./duplicates/{category}.json
.

Phase 5: Generate Report

./scripts/generate-report.sh ./duplicates ./duplicates-report.md

Produces a prioritized markdown report grouped by confidence level.

Phase 6: Human Review

Review the report. For HIGH confidence duplicates:

  1. Verify the recommended survivor has tests
  2. Update callers to use the survivor
  3. Delete the duplicates
  4. Run tests

High-Risk Duplicate Zones

Focus extraction on these areas first - they accumulate duplicates fastest:

ZoneCommon Duplicates
utils/
,
helpers/
,
lib/
General utilities reimplemented
Validation codeSame checks written multiple ways
Error formattingError-to-string conversions
Path manipulationJoining, resolving, normalizing paths
String formattingCase conversion, truncation, escaping
Date formattingSame formats implemented repeatedly
API response shapingSimilar transformations for different endpoints

Common Mistakes

Extracting too much: Focus on exported functions and public methods. Internal helpers are less likely to be duplicated across files.

Skipping the categorization step: Going straight to duplicate detection on the full catalog produces noise. Categories focus the comparison.

Using haiku for duplicate detection: Haiku is cost-effective for categorization but misses subtle semantic duplicates. Use Opus for the actual duplicate analysis.

Consolidating without tests: Before deleting duplicates, ensure the survivor has tests covering all use cases of the deleted functions.