Claude-skill-registry file-categorization
Reusable logic for categorizing files as Command, Agent, Skill, or Documentation based on structure and content analysis
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/file-categorization" ~/.claude/skills/majiayu000-claude-skill-registry-file-categorization && rm -rf "$T"
skills/data/file-categorization/SKILL.mdFile Categorization Skill
When to Use This Skill
- Processing files in integration pipelines
- Scanning directories for file organization
- Auto-routing files to appropriate locations
- Generating file inventory reports
- Validating repository structure
What This Skill Does
Analyzes file structure and content to accurately categorize files into:
- Commands - Slash command definitions
- Agents - Agent configuration files
- Skills - Reusable workflow automation
- Documentation - General markdown documentation
- Other - Uncategorized files requiring manual review
Categorization Logic
Step 1: Filename Pattern Matching
Commands:
- Filename matches
or*-command.md*command.md - Located in
directory.claude/commands/ - Filename uses verb-noun pattern (e.g.,
)integration-scan.md
Agents:
- Filename matches
or*-agent.md*agent.md - Located in
directoryagents-templates/ - Contains role-based names (architect, builder, validator, etc.)
Skills:
- Filename is
orSKILL.md
or*-SKILL.md*-skill.md - Located in
directoriesskills/*/ - Contains workflow automation content
Documentation:
- Standard
files.md - Located in
directorydocs/ - Contains reference or tutorial content
Step 2: Frontmatter Analysis
Read the YAML frontmatter (if present) to identify:
Command Indicators:
--- description: "..." allowed-tools: [...] author: "..." version: "X.Y" ---
Skill Indicators:
--- name: skill-name description: "..." ---
Agent Indicators (less structured, more prose):
## Agent Identity **Role**: [Agent Role] **Version**: X.Y.Z **Purpose**: [Purpose description]
Step 3: Content Structure Analysis
Commands have:
- Workflow sections with numbered steps
- Bash command examples (prefixed with
)!
restrictionsallowed-tools- Usage examples
Agents have:
- Core Responsibilities section
- Allowed Tools and Permissions section
- Workflow Patterns section
- Context Management section
Skills have:
- "When to Use" section
- "What This Skill Does" section
- Step-by-step process descriptions
- Examples with real data
Documentation has:
- Standard markdown structure
- Tutorial or reference content
- No executable workflows
- Educational purpose
Step 4: Keyword Detection
Scan content for category-specific keywords:
Command Keywords:
,!bash
,!git
, etc. (shell commands)!npm- "allowed-tools"
- "Usage:", "Workflow:", "Steps:"
- Command-line patterns
Agent Keywords:
- "Core Responsibilities"
- "Workflow Patterns"
- "Context Management"
- "Orchestrator", "Sub-Agent"
- "Handoff", "Delegation"
Skill Keywords:
- "When to Use"
- "What This Skill Does"
- "Skill" in self-references
- Reusable workflow language
Documentation Keywords:
- "Introduction", "Overview", "Guide"
- "Tutorial", "Reference", "Best Practices"
- Educational/explanatory language
Categorization Algorithm
function categorizeFile(filePath, content): // Phase 1: Filename and location if filename matches command patterns OR in .claude/commands/: category = "Command" confidence = "High" else if filename == "SKILL.md" OR in skills/*/: category = "Skill" confidence = "High" else if in agents-templates/: category = "Agent" confidence = "High" else if in docs/: category = "Documentation" confidence = "Medium" // Phase 2: Frontmatter analysis (refine) frontmatter = extractYAML(content) if frontmatter contains "allowed-tools" AND "version": category = "Command" confidence = "High" else if frontmatter contains "name" (no allowed-tools): category = "Skill" confidence = "High" // Phase 3: Content structure (if still uncertain) if confidence != "High": if content contains "## Agent Identity": category = "Agent" confidence = "High" else if content contains "## When to Use": category = "Skill" confidence = "Medium" else if content contains "!bash" OR "!git": category = "Command" confidence = "Medium" // Phase 4: Fallback if category == null: category = "Other" confidence = "Low" reason = "Unable to determine category, manual review needed" return {category, confidence, reasoning}
Output Format
For each categorized file, return:
### [Filename] - **Category**: [Command|Agent|Skill|Documentation|Other] - **Confidence**: [High|Medium|Low] - **Reasoning**: [Why this category was assigned] - **Frontmatter**: [✅ Valid | ⚠️ Malformed | ❌ Missing] - **Required Fields**: [List of found/missing fields] - **Recommended Location**: [Target directory path]
Example Usage
Example 1: Categorizing Integration File
Input:
File: USING-GIT-WORKTREES-SKILL.md Content: --- name: using-git-worktrees description: Creates isolated git worktrees... --- # Using Git Worktrees ## When to Use ...
Output:
### USING-GIT-WORKTREES-SKILL.md - **Category**: Skill - **Confidence**: High - **Reasoning**: Filename matches skill pattern, frontmatter has 'name' field, content has "When to Use" section - **Frontmatter**: ✅ Valid - **Required Fields**: name ✅, description ✅ - **Recommended Location**: skills/using-git-worktrees/SKILL.md
Example 2: Categorizing Command File
Input:
File: integration-scan.md Content: --- description: "Scan and categorize incoming files" allowed-tools: ["Read", "Bash(find)"] author: "Claude Command and Control" version: "1.0" --- # Integration Scan ## Purpose ...
Output:
### integration-scan.md - **Category**: Command - **Confidence**: High - **Reasoning**: Filename uses verb-noun pattern, frontmatter has 'allowed-tools' and 'version' - **Frontmatter**: ✅ Valid - **Required Fields**: description ✅, allowed-tools ✅, author ✅, version ✅ - **Recommended Location**: .claude/commands/integration-scan.md
Example 3: Uncategorizable File
Input:
File: notes.md Content: # Random Notes Some thoughts about the project...
Output:
### notes.md - **Category**: Other - **Confidence**: Low - **Reasoning**: No frontmatter, no structural indicators, generic content - **Frontmatter**: ❌ Missing - **Required Fields**: N/A - **Recommended Location**: Manual review required
Integration with Commands
Used By
- Primary categorization logic/integration-scan
- Determines target directory/integration-process
- Validates category-specific structure/integration-validate
Usage Pattern
# In integration-scan command For each file in /INTEGRATION/incoming: 1. Read file content 2. Use file-categorization skill 3. Extract category and confidence 4. Include in scan report 5. Mark for processing if High confidence 6. Flag for review if Medium/Low confidence
Category-Specific Validation Rules
Commands
- ✅ MUST have: description, allowed-tools, author, version
- ✅ SHOULD have: workflow steps, usage examples
- ⚠️ Check: Tool permissions not overly broad
Agents
- ✅ MUST have: Agent Identity, Core Responsibilities, Allowed Tools
- ✅ SHOULD have: Workflow Patterns, Context Management
- ⚠️ Check: Role clearly defined
Skills
- ✅ MUST have: name, description, "When to Use"
- ✅ SHOULD have: Examples, step-by-step process
- ⚠️ Check: Examples use real data (not placeholders)
Documentation
- ✅ MUST have: Clear title, structured content
- ✅ SHOULD have: Table of contents, cross-references
- ⚠️ Check: No executable workflows (should be in Command/Skill)
Error Handling
Malformed Frontmatter
Issue: YAML syntax error Action: Note in categorization output Category: "Other" with reason "Invalid frontmatter" Recommendation: Fix YAML before processing
Conflicting Indicators
Issue: Filename says "command" but structure says "skill" Action: Confidence = "Medium" Reasoning: "Filename and content indicators conflict" Recommendation: Manual review
Missing Content
Issue: File is empty or too short (<100 chars) Action: Category = "Other" Confidence: "Low" Reasoning: "Insufficient content for categorization"
Testing Recommendations
Test with:
- Typical files - Standard commands, agents, skills
- Edge cases - Mixed indicators, missing frontmatter
- Malformed files - Syntax errors, incomplete content
- Ambiguous files - Could fit multiple categories
Expected accuracy:
- High confidence: >95% correct
- Medium confidence: >80% correct
- Low confidence: Requires manual review
Version History
1.0 (2025-11-23)
- Initial file categorization skill
- Four-phase categorization algorithm
- Integration with scan/process commands
- Comprehensive validation rules
Skill Status: Production Ready Accuracy Target: >95% for High confidence categorizations Dependencies: None (standalone logic)