Agent-almanac analyze-codebase-workflow
git clone https://github.com/pjt222/agent-almanac
T=$(mktemp -d) && git clone --depth=1 https://github.com/pjt222/agent-almanac "$T" && mkdir -p ~/.claude/skills && cp -r "$T/i18n/caveman-ultra/skills/analyze-codebase-workflow" ~/.claude/skills/pjt222-agent-almanac-analyze-codebase-workflow-622285 && rm -rf "$T"
i18n/caveman-ultra/skills/analyze-codebase-workflow/SKILL.mdAnalyze Codebase Workflow
Survey repo → auto-detect data flows, file I/O, script deps → structured annotation plan for manual refinement.
Use When
- Onboard unfamiliar codebase → understand data flow
- Start putior integration, no PUT annotations
- Audit existing data pipeline pre-doc
- Prep annotation plan before
annotate-source-files
In
- Required: Path to repo/src dir
- Optional: Subdirs focus (default: entire repo)
- Optional: Langs include/exclude (default: all detected)
- Optional: Scope: inputs only, outputs only, both (default: both + deps)
Do
Step 1: Survey Repo Structure
Identify src files + langs → what putior can analyze.
library(putior) # List all supported languages and their extensions list_supported_languages() list_supported_languages(detection_only = TRUE) # Only languages with auto-detection # Get supported extensions exts <- get_supported_extensions()
File listing → repo composition:
# Count files by extension in the target directory find /path/to/repo -type f | sed 's/.*\.//' | sort | uniq -c | sort -rn | head -20
→ File extensions in repo + counts. Map against
get_supported_extensions() → coverage.
If err: No files match supported → putior can't auto-detect. Check if lang supported but non-standard ext.
Step 2: Check Detection Coverage
Per detected lang → verify auto-detect pattern available.
# Check which languages have auto-detection patterns (18 languages, 902 patterns) detection_langs <- list_supported_languages(detection_only = TRUE) cat("Languages with auto-detection:\n") print(detection_langs) # Get pattern counts for specific languages found in the repo for (lang in c("r", "python", "javascript", "sql", "dockerfile", "makefile")) { patterns <- get_detection_patterns(lang) cat(sprintf("%s: %d input, %d output, %d dependency patterns\n", lang, length(patterns$input), length(patterns$output), length(patterns$dependency) )) }
→ Pattern counts printed. R 124, Python 159, JS 71, etc.
If err: No patterns → supports manual only, not auto. Plan manual annotations.
Step 3: Run Auto-Detection
Execute
put_auto() → discover workflow elements.
# Full auto-detection workflow <- put_auto("./src/", detect_inputs = TRUE, detect_outputs = TRUE, detect_dependencies = TRUE ) # Exclude build scripts and test helpers from scanning workflow <- put_auto("./src/", detect_inputs = TRUE, detect_outputs = TRUE, detect_dependencies = TRUE, exclude = c("build-", "test_helper") ) # View detected workflow nodes print(workflow) # Check node count cat(sprintf("Detected %d workflow nodes\n", nrow(workflow)))
Large repos → analyze subdirs incrementally:
# Analyze specific subdirectories etl_workflow <- put_auto("./src/etl/") api_workflow <- put_auto("./src/api/")
→ Df w/
id, label, input, output, source_file cols. Row = detected step.
If err: Empty → src may lack recognizable I/O patterns. Try
workflow <- put_auto("./src/", log_level = "DEBUG") → see scanned + matched.
Step 4: Initial Diagram
Visualize auto-detected → assess coverage + gaps.
# Generate diagram from auto-detected workflow cat(put_diagram(workflow, theme = "github")) # With source file info for traceability cat(put_diagram(workflow, show_source_info = TRUE)) # Save to file for review writeLines(put_diagram(workflow, theme = "github"), "workflow-auto.md")
→ Mermaid flowchart, detected nodes + data flow edges. Meaningful fn/file labels.
If err: Disconnected nodes → auto-detect found I/O but couldn't infer connections. Normal — matching output → input filenames. Annotation plan next step fills.
Step 5: Annotation Plan
Generate plan → what found + what needs manual.
# Generate annotation suggestions put_generate("./src/", style = "single") # For multiline style (more readable for complex workflows) put_generate("./src/", style = "multiline") # Copy suggestions to clipboard for easy pasting put_generate("./src/", output = "clipboard")
Doc plan w/ coverage assessment:
## Annotation Plan ### Auto-Detected (no manual work needed) - `src/etl/extract.R` — 3 inputs, 2 outputs detected - `src/etl/transform.py` — 1 input, 1 output detected ### Needs Manual Annotation - `src/api/handler.js` — Language supported but no I/O patterns matched - `src/config/setup.sh` — Only 12 shell patterns; complex logic missed ### Not Supported - `src/legacy/process.f90` — Fortran not in detection languages ### Recommended Connections - extract.R output `data.csv` → transform.py input `data.csv` (auto-linked) - transform.py output `clean.parquet` → load.R input (needs annotation)
→ Clear plan: auto-detected vs manual, specific recs per file.
If err:
put_generate() no out → verify path correct + has supported src files.
Check
-
no err on targetput_auto() - Detected workflow has ≥1 node (unless no recognizable I/O)
-
produces valid Mermaidput_diagram() -
produces suggestions for detected filesput_generate() - Annotation plan doc created w/ coverage assessment
Traps
- Scan too broad:
→ includesput_auto(".")
,node_modules/
,.git/
. Target specific src dirs.venv/ - Expect full coverage: Auto-detect finds I/O + lib calls, not business logic. 40-60% typical; rest manual.
- Ignore deps:
catchesdetect_dependencies = TRUE
,source()
,import
→ links scripts. Disable → lose cross-file connections.require() - Lang mismatch: Non-standard ext (
vs.R
,.r
vs.jsx
) may not detect. Use.js
. Extensionlessget_comment_prefix()
,Dockerfile
supported via filename match.Makefile - Large repos: 100+ src files → analyze by module/dir → diagrams readable.
→
— prereqinstall-putior
— next: add manualannotate-source-files
— final after annotationgenerate-workflow-diagram
— MCP tools for interactiveconfigure-putior-mcp