Claude-skill-registry cartographer
Maps and documents codebases of any size by orchestrating parallel subagents. Creates docs/CODEBASE_MAP.md with architecture, file purposes, dependencies, and navigation guides. Updates CLAUDE.md with a summary. Use when user says "map this codebase", "cartographer", "/cartographer", "create codebase map", "document the architecture", "understand this codebase", or when onboarding to a new project. Automatically detects if map exists and updates only changed sections.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/cartographer" ~/.claude/skills/majiayu000-claude-skill-registry-cartographer && rm -rf "$T"
skills/data/cartographer/SKILL.mdCartographer
Maps codebases of any size using parallel Sonnet subagents with 1M token context windows.
CRITICAL: Opus orchestrates, Sonnet reads. Never have Opus read codebase files directly. Always delegate file reading to Sonnet subagents - even for small codebases. Opus plans the work, spawns subagents, and synthesizes their reports.
Quick Start
- Run the scanner script to get file tree with token counts
- Analyze the scan output to plan subagent work assignments
- Spawn Sonnet subagents in parallel to read and analyze file groups
- Synthesize subagent reports into
docs/CODEBASE_MAP.md - Update
with summary pointing to the mapCLAUDE.md
Workflow
Step 1: Check for Existing Map
First, check if
docs/CODEBASE_MAP.md already exists:
If it exists:
- Read the
timestamp from the map's frontmatterlast_mapped - Check for changes since last map:
- Run
if git availablegit log --oneline --since="<last_mapped>" - If no git, run the scanner and compare file counts/paths
- Run
- If significant changes detected, proceed to update mode
- If no changes, inform user the map is current
If it does not exist: Proceed to full mapping.
Step 2: Scan the Codebase
Run the scanner script to get an overview. Try these in order until one works:
# Option 1: If uv is available (preferred) uv run .claude/skills/cartographer/scripts/scan-codebase.py . --format json # Option 2: Direct execution (uses shebang) .claude/skills/cartographer/scripts/scan-codebase.py . --format json # Option 3: Explicit python3 python3 .claude/skills/cartographer/scripts/scan-codebase.py . --format json # Option 4: Explicit python (some systems) python .claude/skills/cartographer/scripts/scan-codebase.py . --format json
If tiktoken is missing, install it:
# With uv (if available) uv pip install tiktoken # Or standard pip pip install tiktoken # or pip3 install tiktoken
The output provides:
- Complete file tree with token counts per file
- Total token budget needed
- Skipped files (binary, too large)
Step 3: Plan Subagent Assignments
Analyze the scan output to divide work among subagents:
Token budget per subagent: ~500,000 tokens (safe margin under Sonnet's 1M limit)
Grouping strategy:
- Group files by directory/module (keeps related code together)
- Balance token counts across groups
- Aim for 3-8 subagents depending on codebase size
For small codebases (<100k tokens): Still use a single Sonnet subagent. Opus orchestrates, Sonnet reads - never have Opus read the codebase directly.
Example assignment:
Subagent 1: src/api/, src/middleware/ (~450k tokens) Subagent 2: src/components/, src/hooks/ (~480k tokens) Subagent 3: src/lib/, src/utils/, tests/ (~420k tokens)
Step 4: Spawn Sonnet Subagents in Parallel
Use the Task tool with
subagent_type: "Explore" and model: "sonnet" for each group.
CRITICAL: Spawn all subagents in a SINGLE message with multiple Task tool calls.
Each subagent prompt should:
- List the specific files/directories to read
- Request analysis of:
- Purpose of each file/module
- Key exports and public APIs
- Dependencies (what it imports)
- Dependents (what imports it, if discoverable)
- Patterns and conventions used
- Gotchas or non-obvious behavior
- Request output as structured markdown
Example subagent prompt:
You are mapping part of a codebase. Read and analyze these files: - src/api/routes.ts - src/api/middleware/auth.ts - src/api/middleware/rateLimit.ts [... list all files in this group] For each file, document: 1. **Purpose**: One-line description 2. **Exports**: Key functions, classes, types exported 3. **Imports**: Notable dependencies 4. **Patterns**: Design patterns or conventions used 5. **Gotchas**: Non-obvious behavior, edge cases, warnings Also identify: - How these files connect to each other - Entry points and data flow - Any configuration or environment dependencies Return your analysis as markdown with clear headers per file/module.
Step 5: Synthesize Reports
Once all subagents complete, synthesize their outputs:
- Merge all subagent reports
- Deduplicate any overlapping analysis
- Identify cross-cutting concerns (shared patterns, common gotchas)
- Build the architecture diagram showing module relationships
- Extract key navigation paths for common tasks
Step 6: Write CODEBASE_MAP.md
Create
docs/CODEBASE_MAP.md using this structure:
--- last_mapped: YYYY-MM-DDTHH:MM:SSZ total_files: N total_tokens: N --- # Codebase Map > Auto-generated by Cartographer. Last mapped: [date] ## System Overview [Mermaid diagram showing high-level architecture] ```mermaid graph TB subgraph Client Web[Web App] end subgraph API Server[API Server] Auth[Auth Middleware] end subgraph Data DB[(Database)] Cache[(Cache)] end Web --> Server Server --> Auth Server --> DB Server --> Cache
[Adapt the above to match the actual architecture]
Directory Structure
[Tree with purpose annotations]
Module Guide
[Module Name]
Purpose: [description] Entry point: [file] Key files:
| File | Purpose | Tokens |
|---|
Exports: [key APIs] Dependencies: [what it needs] Dependents: [what needs it]
[Repeat for each module]
Data Flow
[Mermaid sequence diagrams for key flows]
sequenceDiagram participant User participant Web participant API participant DB User->>Web: Action Web->>API: Request API->>DB: Query DB-->>API: Result API-->>Web: Response Web-->>User: Update UI
[Create diagrams for: auth flow, main data operations, etc.]
Conventions
[Naming, patterns, style]
Gotchas
[Non-obvious behaviors, warnings]
Navigation Guide
To add a new API endpoint: [files to touch] To add a new component: [files to touch] To modify auth: [files to touch] [etc.]
### Step 7: Update CLAUDE.md Add or update the codebase summary in CLAUDE.md: ```markdown ## Codebase Overview [2-3 sentence summary] **Stack**: [key technologies] **Structure**: [high-level layout] For detailed architecture, see [docs/CODEBASE_MAP.md](docs/CODEBASE_MAP.md).
If
AGENTS.md exists, update it similarly.
Update Mode
When updating an existing map:
- Identify changed files from git or scanner diff
- Spawn subagents only for changed modules
- Merge new analysis with existing map
- Update
timestamplast_mapped - Preserve unchanged sections
Token Budget Reference
| Model | Context Window | Safe Budget per Subagent |
|---|---|---|
| Sonnet | 1,000,000 | 500,000 |
| Opus | 200,000 | 100,000 |
| Haiku | 200,000 | 100,000 |
Always use Sonnet subagents for maximum file coverage.
Troubleshooting
Scanner fails with tiktoken error:
pip install tiktoken # or pip3 install tiktoken # or with uv: uv pip install tiktoken
Python not found: Try
python3, python, or use uv run which handles Python automatically.
Codebase too large even for subagents:
- Increase number of subagents
- Focus on src/ directories, skip vendored code
- Use
flag to skip huge files--max-tokens
Git not available:
- Fall back to file count/path comparison
- Store file list hash in map frontmatter for change detection