Skills-4-SE legacy-code-summarizer
Produces comprehensive summaries and insights about legacy codebases to help understand unfamiliar code. Use when onboarding to a new project, planning refactoring efforts, assessing code for acquisition/migration, or generating documentation for undocumented systems. Analyzes architecture, dependencies, code quality issues, and test coverage. Creates high-level overviews with architecture diagrams, key components, entry points, and actionable insights for understanding and improving legacy code.
git clone https://github.com/ArabelaTso/Skills-4-SE
T=$(mktemp -d) && git clone --depth=1 https://github.com/ArabelaTso/Skills-4-SE "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/legacy-code-summarizer" ~/.claude/skills/arabelatso-skills-4-se-legacy-code-summarizer && rm -rf "$T"
skills/legacy-code-summarizer/SKILL.mdLegacy Code Summarizer
Analyze and summarize legacy codebases to quickly understand their structure, quality, and improvement opportunities.
Core Capabilities
This skill helps understand legacy code by:
- Mapping architecture - Identify key components, layers, and relationships
- Analyzing dependencies - Understand module coupling and import patterns
- Detecting quality issues - Find code smells, technical debt, and outdated patterns
- Assessing test coverage - Identify testing gaps and untested code
- Generating documentation - Create actionable summaries for teams
Code Analysis Workflow
Step 1: Survey the Codebase
Get an overview of the project structure and size.
Initial Questions:
- What programming language(s)?
- What is the project structure?
- How large is the codebase?
- What frameworks/libraries are used?
- Is there existing documentation?
Commands to Run:
# Count lines of code find . -name "*.py" | xargs wc -l | tail -1 # Python find . -name "*.java" | xargs wc -l | tail -1 # Java # Count files find . -name "*.py" | wc -l find . -name "*.java" | wc -l # Directory structure tree -L 3 -I '__pycache__|node_modules|target|build' # Or without tree command find . -type d -not -path '*/\.*' | head -20
Identify Project Type:
- Web application (frontend/backend)
- CLI tool
- Library/framework
- Microservice
- Monolith
- Desktop application
Step 2: Identify Entry Points
Find where execution starts and main workflows.
Common Entry Points:
Python:
# Find main entry points grep -r "if __name__ == '__main__':" --include="*.py" # Find Flask/Django apps grep -r "app = Flask\|application = " --include="*.py" grep -r "INSTALLED_APPS\|MIDDLEWARE" --include="*.py" # Find CLI entry points (setup.py, pyproject.toml) grep -A 10 "entry_points\|console_scripts" setup.py pyproject.toml
Java:
# Find main methods grep -r "public static void main" --include="*.java" # Find Spring Boot applications grep -r "@SpringBootApplication" --include="*.java" # Find servlets grep -r "extends HttpServlet\|@WebServlet" --include="*.java"
JavaScript/TypeScript:
# Check package.json for entry points cat package.json | grep -A 5 "main\|scripts" # Find Express apps grep -r "app = express()\|express()" --include="*.js" --include="*.ts" # Find React entry points find . -name "index.js" -o -name "index.tsx" -o -name "App.js"
Step 3: Map Architecture and Components
Understand the high-level structure and key modules.
Analyze Directory Structure:
# List top-level directories ls -d */ | head -20 # Common patterns to look for: # - src/ or lib/ (source code) # - tests/ or test/ (test files) # - config/ (configuration) # - docs/ (documentation) # - scripts/ (utility scripts) # - models/ or entities/ (data models) # - views/ or templates/ (UI) # - controllers/ or handlers/ (business logic) # - services/ or api/ (external services) # - utils/ or helpers/ (utilities)
Identify Architecture Pattern:
Common patterns in legacy code:
- MVC (Model-View-Controller): Django, Rails, Spring MVC
- Layered: Presentation → Business → Data layers
- Microservices: Multiple small services
- Monolith: Single large application
- Plugin-based: Core + extensions
See
references/architecture_patterns.md for detailed pattern identification.
Create Architecture Diagram:
Example Web Application Architecture: ┌─────────────────────────────────────────┐ │ Frontend (React) │ │ - components/ │ │ - pages/ │ │ - hooks/ │ └───────────────┬─────────────────────────┘ │ API Calls ↓ ┌─────────────────────────────────────────┐ │ API Layer (Flask/Express) │ │ - routes/ │ │ - middleware/ │ └───────────────┬─────────────────────────┘ │ ↓ ┌─────────────────────────────────────────┐ │ Business Logic │ │ - services/ │ │ - controllers/ │ └───────────────┬─────────────────────────┘ │ ↓ ┌─────────────────────────────────────────┐ │ Data Layer │ │ - models/ │ │ - repositories/ │ └───────────────┬─────────────────────────┘ │ ↓ ┌─────────────────────────────────────────┐ │ Database (PostgreSQL/MongoDB) │ └─────────────────────────────────────────┘
Step 4: Analyze Dependencies
Map module relationships and identify coupling issues.
Find Direct Dependencies:
Python:
# Find imports in all Python files grep -rh "^import \|^from " --include="*.py" | sort | uniq # Analyze requirements cat requirements.txt # Or from setup.py grep -A 20 "install_requires" setup.py
Java:
# Analyze Maven dependencies cat pom.xml | grep -A 3 "<dependency>" # Or Gradle cat build.gradle | grep -A 3 "implementation\|compile" # Find imports in code grep -rh "^import " --include="*.java" | sort | uniq | head -50
JavaScript:
# Analyze package.json cat package.json | grep -A 50 "dependencies" # Find imports grep -rh "^import \|require(" --include="*.js" --include="*.ts" | head -50
Create Dependency Map:
Key Internal Dependencies: auth module ├─ depends on: user_model, database, config └─ used by: api_routes, admin_panel user_model ├─ depends on: database, validators └─ used by: auth, profile, admin payment module ├─ depends on: user_model, external_api, logger └─ used by: checkout, subscription Circular dependencies detected: ⚠️ module_a → module_b → module_c → module_a
See
references/dependency_analysis.md for tools and techniques.
Step 5: Identify Code Quality Issues
Detect technical debt, code smells, and improvement opportunities.
Common Quality Issues to Look For:
1. Large Files (God Objects)
# Find files over 500 lines find . -name "*.py" -exec wc -l {} \; | awk '$1 > 500' | sort -rn # Find files over 1000 lines (serious issue) find . -name "*.java" -exec wc -l {} \; | awk '$1 > 1000' | sort -rn
2. Dead Code
# Find unused imports (Python - requires tools) # Install: pip install autoflake find . -name "*.py" -exec autoflake --check {} \; # Find TODO/FIXME comments grep -rn "TODO\|FIXME\|HACK\|XXX" --include="*.py" --include="*.java"
3. Code Duplication
# Find duplicate code (requires tool) # Install: pip install pylint pylint --disable=all --enable=duplicate-code src/ # Or use PMD for Java # pmd cpd --minimum-tokens 100 --files src/
4. Complex Functions
# Find long functions (crude check - look for large blocks) # Python: Look for functions with many lines between def and next def # Java: Look for methods with many lines between { and } # Use complexity tools for accurate analysis: # Python: radon cc src/ -a # Java: Use PMD or Checkstyle
5. Missing Documentation
# Find functions without docstrings (Python) grep -A 1 "^def " --include="*.py" -r . | grep -v '"""' | grep -v "'''" # Find classes without documentation (Java) grep -B 1 "^public class\|^class " --include="*.java" -r . | grep -v "/\*\*" | grep -v "//"
6. Outdated Patterns
Look for:
- Python 2 syntax (e.g.,
,print "hello"
)raw_input() - Java pre-8 patterns (no lambdas, no Optional)
- Deprecated libraries
- Security vulnerabilities (SQL injection, XSS)
See
references/code_quality_checklist.md for comprehensive quality checks.
Step 6: Assess Test Coverage
Identify testing gaps and quality of existing tests.
Find Tests:
# Python tests find . -name "test_*.py" -o -name "*_test.py" ls tests/ test/ # Java tests find . -name "*Test.java" -o -name "*Tests.java" ls src/test/ # JavaScript tests find . -name "*.test.js" -o -name "*.spec.js" -o -name "*.test.ts"
Calculate Test Coverage:
Python:
# Install coverage tool pip install pytest-cov # Run tests with coverage pytest --cov=src --cov-report=term-missing # Generate HTML report pytest --cov=src --cov-report=html open htmlcov/index.html
Java:
# Maven with JaCoCo mvn clean test jacoco:report # View report open target/site/jacoco/index.html
JavaScript:
# Jest with coverage npm test -- --coverage # View report open coverage/lcov-report/index.html
Assess Test Quality:
Quality Checklist: - [ ] Unit tests exist for core business logic - [ ] Integration tests cover key workflows - [ ] Tests are readable and maintainable - [ ] Tests run quickly (< 10 seconds for unit tests) - [ ] Mocking is used appropriately - [ ] Edge cases are tested - [ ] Tests don't depend on external services (or are mocked) - [ ] Coverage > 70% for critical modules
Step 7: Generate Summary Report
Create actionable documentation for the team.
Summary Template:
# Legacy Codebase Summary: [Project Name] ## Executive Summary [2-3 sentence overview of what the codebase does] **Key Metrics:** - Lines of Code: [X] - Number of Files: [Y] - Primary Language: [Language] - Test Coverage: [Z%] - Last Major Update: [Date] ## Architecture Overview ### High-Level Structure [Include architecture diagram from Step 3] ### Key Components 1. **[Component Name]** (`path/to/component/`) - **Purpose:** [What it does] - **Entry Point:** [Main file/class] - **Dependencies:** [Key dependencies] - **Lines of Code:** [X] 2. **[Component Name]** (`path/to/component/`) - **Purpose:** [What it does] - **Entry Point:** [Main file/class] - **Dependencies:** [Key dependencies] - **Lines of Code:** [X] [Repeat for 5-10 key components] ### Technology Stack **Core Technologies:** - [Language] [Version] - [Framework] [Version] - [Database] [Version] **Key Dependencies:** - [Library 1] - [Purpose] - [Library 2] - [Purpose] - [Library 3] - [Purpose] ## Entry Points and Workflows ### Main Entry Points 1. **[Entry Point Name]** - `path/to/file.py:function()` - **Purpose:** [What it does] - **Triggered by:** [User action, cron, API call, etc.] 2. **[Entry Point Name]** - `path/to/file.java:main()` - **Purpose:** [What it does] - **Triggered by:** [How it's invoked] ### Critical Workflows **Workflow 1: [Name]** (e.g., User Registration)
- User submits form → routes/auth.py:register()
- Validates input → validators/user_validator.py
- Creates user → models/user.py:create()
- Sends email → services/email_service.py
- Returns response
**Workflow 2: [Name]** (e.g., Payment Processing)
[Step-by-step flow]
## Dependency Analysis ### External Dependencies **Total Dependencies:** [X] **Outdated Dependencies (require updates):** - [Library Name] [Current Version] → [Latest Version] - [Library Name] [Current Version] → [Latest Version] **Deprecated Dependencies (require replacement):** - [Library Name] - Deprecated since [Date] - **Suggested Replacement:** [New Library] ### Internal Dependencies **Highly Coupled Modules (>5 dependencies):** - `module_a` - depends on [X] modules - `module_b` - depends on [Y] modules **Circular Dependencies:** - ⚠️ `auth` → `user` → `auth` - ⚠️ `order` → `payment` → `order` ## Code Quality Assessment ### Metrics Summary - **Average File Size:** [X] lines - **Largest File:** `path/to/file.py` ([X] lines) ⚠️ - **TODO/FIXME Comments:** [X] occurrences - **Code Duplication:** [Low/Medium/High] ### Quality Issues **Critical Issues (Fix Immediately):** 1. **Security Vulnerability:** SQL injection in `path/to/file.py:45` 2. **Large File:** `god_class.java` (2,500 lines) - violates SRP 3. **Circular Dependency:** [Details] **High Priority (Address Soon):** 1. **No Error Handling:** Missing try/catch in payment module 2. **Hardcoded Credentials:** Found in `config/settings.py` 3. **Deprecated API:** Using old authentication library **Medium Priority (Technical Debt):** 1. **Code Duplication:** Copy-pasted validation logic in 5 files 2. **Missing Documentation:** 60% of functions lack docstrings 3. **Long Methods:** 15 methods exceed 100 lines **Low Priority (Improvements):** 1. **Outdated Naming:** Inconsistent variable names 2. **Missing Type Hints:** (Python) or generics (Java) 3. **Verbose Code:** Could be simplified with modern patterns ### Code Smells Detected - **God Objects:** [List large classes/modules] - **Feature Envy:** [Methods accessing other objects' data frequently] - **Dead Code:** [Unused functions/classes] - **Magic Numbers:** [Hardcoded values without constants] ## Test Coverage Analysis ### Coverage Summary - **Overall Coverage:** [X%] - **Critical Modules Coverage:** - auth module: [Y%] - payment module: [Z%] - user management: [W%] ### Testing Gaps **Untested Critical Code:** 1. `payment/processor.py` - 0% coverage ⚠️ 2. `auth/security.py` - 30% coverage 3. `api/routes.py` - 45% coverage **Missing Test Types:** - [ ] No integration tests for payment flow - [ ] No end-to-end tests for user journey - [ ] No performance/load tests ### Test Quality Issues - **Slow Tests:** 20 tests take >5 seconds each - **Flaky Tests:** `test_async_operation` fails intermittently - **Coupled Tests:** Tests depend on database state ## Recommendations ### Immediate Actions (This Sprint) 1. **Fix Security Issues** - Patch SQL injection vulnerability in `auth/login.py` - Remove hardcoded credentials, use environment variables 2. **Add Critical Tests** - Write integration tests for payment processor - Add unit tests for authentication logic 3. **Break Circular Dependencies** - Refactor `auth` ↔ `user` circular dependency - Extract shared code to new `common` module ### Short-Term Improvements (This Quarter) 1. **Reduce Technical Debt** - Refactor `god_class.java` into 3-4 focused classes - Eliminate code duplication in validation logic - Update deprecated dependencies 2. **Improve Documentation** - Add docstrings to all public functions - Create architecture diagram - Document deployment process 3. **Enhance Test Coverage** - Achieve 70% coverage for core modules - Add integration tests for critical workflows - Set up CI/CD with automated testing ### Long-Term Improvements (This Year) 1. **Architectural Refactoring** - Extract microservices for payment and notification - Implement proper layering (separate business logic from data access) - Introduce dependency injection for better testability 2. **Modernization** - Upgrade to [Language] [Latest Version] - Adopt modern patterns (async/await, type hints, etc.) - Migrate from [Old Framework] to [New Framework] 3. **Quality Infrastructure** - Set up automated code quality checks (linting, complexity analysis) - Implement pre-commit hooks - Add performance monitoring ## Quick Reference ### Key Files to Understand First 1. `path/to/main.py` - Application entry point 2. `path/to/config.py` - Configuration 3. `path/to/models/user.py` - Core data model 4. `path/to/api/routes.py` - API endpoints 5. `path/to/services/auth_service.py` - Authentication logic ### Common Commands ```bash # Start application [command] # Run tests [command] # Build for production [command] # Deploy [command]
Key Contacts
- Original Authors: [Names/emails if available]
- Current Maintainers: [Names/emails]
- Documentation: [Links]
- Issue Tracker: [URL]
Appendix
Glossary
- [Term]: [Definition]
- [Term]: [Definition]
External Resources
- [Link to original documentation]
- [Link to related projects]
- [Link to framework docs]
## Summary Output Examples ### Example 1: Small Python Flask App ```markdown # Legacy Codebase Summary: Internal Dashboard ## Executive Summary Internal dashboard for monitoring application metrics, built with Flask. Provides real-time data visualization and alerting for operations team. **Key Metrics:** - Lines of Code: 3,500 - Number of Files: 42 - Primary Language: Python 3.7 - Test Coverage: 45% - Last Major Update: 18 months ago ## Architecture Overview Simple Flask application with SQLAlchemy ORM and PostgreSQL database.
┌─────────────────┐ │ Flask Routes │ │ (app/routes/) │ └────────┬────────┘ │ ↓ ┌─────────────────┐ │ Services │ │ (app/services/)│ └────────┬────────┘ │ ↓ ┌─────────────────┐ │ Models │ │ (app/models/) │ └────────┬────────┘ │ ↓ ┌─────────────────┐ │ PostgreSQL DB │ └─────────────────┘
### Key Components 1. **Metrics Dashboard** (`app/routes/dashboard.py`) - Purpose: Display real-time metrics - Entry Point: `dashboard_view()` - Dependencies: metrics_service, chart_generator - Lines of Code: 250 2. **Data Collection** (`app/services/collector.py`) - Purpose: Fetch metrics from external APIs - Entry Point: `collect_metrics()` (cron job) - Dependencies: requests, database models - Lines of Code: 180 3. **Alert System** (`app/services/alerts.py`) - Purpose: Send notifications when thresholds exceeded - Entry Point: `check_alerts()` (background task) - Dependencies: email_service, metrics_service - Lines of Code: 150 ## Recommendations ### Immediate Actions 1. Update Flask to latest version (security patches) 2. Add tests for alert system (currently 0% coverage) 3. Fix hardcoded database credentials ### Short-Term 1. Increase test coverage to 70% 2. Add API documentation 3. Refactor large dashboard route (300+ lines)
Example 2: Large Java Spring Application
# Legacy Codebase Summary: E-Commerce Platform ## Executive Summary Full-featured e-commerce platform handling product catalog, orders, payments, and customer management. Serves 100K+ daily active users. **Key Metrics:** - Lines of Code: 185,000 - Number of Files: 1,240 - Primary Language: Java 8 - Test Coverage: 62% - Last Major Update: 6 months ago ## Architecture Overview Layered Spring Boot application with microservice patterns emerging. [Detailed architecture diagram showing layers] ### Critical Issues Identified **High Priority:** 1. **Memory Leak:** Order processing service shows increasing heap usage 2. **N+1 Query Problem:** Product listing generates 500+ DB queries 3. **No Monitoring:** Missing APM tools for production **Modernization Opportunities:** 1. Migrate to Java 17 (LTS) 2. Extract payment service as microservice 3. Implement caching layer (Redis) ## Recommendations [Detailed phased approach to refactoring]
Best Practices
- Start broad, then narrow - Overview first, details second
- Focus on actionable insights - Prioritize what can be improved
- Use visual aids - Diagrams clarify complex relationships
- Prioritize by risk - Security and stability issues first
- Be specific - Point to exact files and line numbers
- Estimate effort - Help teams plan refactoring work
- Document assumptions - Note what analysis couldn't determine
- Update regularly - Re-analyze as code evolves
Resources
- Common architectural patterns in legacy systems and how to identify themreferences/architecture_patterns.md
- Tools and techniques for analyzing module dependencies and couplingreferences/dependency_analysis.md
- Comprehensive checklist for assessing code quality and technical debtreferences/code_quality_checklist.md
Quick Reference
| Task | Command/Approach |
|---|---|
| Count LOC | |
| Find entry points | |
| Analyze imports | |
| Find large files | |
| Test coverage | |
| Find TODOs | |