Claude-skill-registry file-categorization

Reusable logic for categorizing files as Command, Agent, Skill, or Documentation based on structure and content analysis

install

source · Clone the upstream repo

git clone https://github.com/majiayu000/claude-skill-registry

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/file-categorization" ~/.claude/skills/majiayu000-claude-skill-registry-file-categorization && rm -rf "$T"

manifest: skills/data/file-categorization/SKILL.md

source content

File Categorization Skill

When to Use This Skill

Processing files in integration pipelines
Scanning directories for file organization
Auto-routing files to appropriate locations
Generating file inventory reports
Validating repository structure

What This Skill Does

Analyzes file structure and content to accurately categorize files into:

Commands - Slash command definitions
Agents - Agent configuration files
Skills - Reusable workflow automation
Documentation - General markdown documentation
Other - Uncategorized files requiring manual review

Categorization Logic

Step 1: Filename Pattern Matching

Commands:

Filename matches
```
*-command.md
```
or
```
*command.md
```
Located in
```
.claude/commands/
```
directory
Filename uses verb-noun pattern (e.g.,
```
integration-scan.md
```
)

Agents:

Filename matches
```
*-agent.md
```
or
```
*agent.md
```
Located in
```
agents-templates/
```
directory
Contains role-based names (architect, builder, validator, etc.)

Skills:

Filename is
```
SKILL.md
```
or
```
*-SKILL.md
```
or
```
*-skill.md
```
Located in
```
skills/*/
```
directories
Contains workflow automation content

Documentation:

Standard
```
.md
```
files
Located in
```
docs/
```
directory
Contains reference or tutorial content

Step 2: Frontmatter Analysis

Read the YAML frontmatter (if present) to identify:

Command Indicators:

---
description: "..."
allowed-tools: [...]
author: "..."
version: "X.Y"
---

Skill Indicators:

---
name: skill-name
description: "..."
---

Agent Indicators (less structured, more prose):

## Agent Identity
**Role**: [Agent Role]
**Version**: X.Y.Z
**Purpose**: [Purpose description]

Step 3: Content Structure Analysis

Commands have:

Workflow sections with numbered steps
Bash command examples (prefixed with
```
!
```
)
```
allowed-tools
```
restrictions
Usage examples

Agents have:

Core Responsibilities section
Allowed Tools and Permissions section
Workflow Patterns section
Context Management section

Skills have:

"When to Use" section
"What This Skill Does" section
Step-by-step process descriptions
Examples with real data

Documentation has:

Standard markdown structure
Tutorial or reference content
No executable workflows
Educational purpose

Step 4: Keyword Detection

Scan content for category-specific keywords:

Command Keywords:

```
!bash
```
,
```
!git
```
,
```
!npm
```
, etc. (shell commands)
"allowed-tools"
"Usage:", "Workflow:", "Steps:"
Command-line patterns

Agent Keywords:

"Core Responsibilities"
"Workflow Patterns"
"Context Management"
"Orchestrator", "Sub-Agent"
"Handoff", "Delegation"

Skill Keywords:

"When to Use"
"What This Skill Does"
"Skill" in self-references
Reusable workflow language

Documentation Keywords:

"Introduction", "Overview", "Guide"
"Tutorial", "Reference", "Best Practices"
Educational/explanatory language

Categorization Algorithm

function categorizeFile(filePath, content):
  // Phase 1: Filename and location
  if filename matches command patterns OR in .claude/commands/:
    category = "Command"
    confidence = "High"

  else if filename == "SKILL.md" OR in skills/*/:
    category = "Skill"
    confidence = "High"

  else if in agents-templates/:
    category = "Agent"
    confidence = "High"

  else if in docs/:
    category = "Documentation"
    confidence = "Medium"

  // Phase 2: Frontmatter analysis (refine)
  frontmatter = extractYAML(content)
  if frontmatter contains "allowed-tools" AND "version":
    category = "Command"
    confidence = "High"

  else if frontmatter contains "name" (no allowed-tools):
    category = "Skill"
    confidence = "High"

  // Phase 3: Content structure (if still uncertain)
  if confidence != "High":
    if content contains "## Agent Identity":
      category = "Agent"
      confidence = "High"

    else if content contains "## When to Use":
      category = "Skill"
      confidence = "Medium"

    else if content contains "!bash" OR "!git":
      category = "Command"
      confidence = "Medium"

  // Phase 4: Fallback
  if category == null:
    category = "Other"
    confidence = "Low"
    reason = "Unable to determine category, manual review needed"

  return {category, confidence, reasoning}

Output Format

For each categorized file, return:

### [Filename]
- **Category**: [Command|Agent|Skill|Documentation|Other]
- **Confidence**: [High|Medium|Low]
- **Reasoning**: [Why this category was assigned]
- **Frontmatter**: [✅ Valid | ⚠️ Malformed | ❌ Missing]
- **Required Fields**: [List of found/missing fields]
- **Recommended Location**: [Target directory path]

Example Usage

Example 1: Categorizing Integration File

Input:

File: USING-GIT-WORKTREES-SKILL.md
Content:
---
name: using-git-worktrees
description: Creates isolated git worktrees...
---

# Using Git Worktrees

## When to Use
...

Output:

### USING-GIT-WORKTREES-SKILL.md
- **Category**: Skill
- **Confidence**: High
- **Reasoning**: Filename matches skill pattern, frontmatter has 'name' field, content has "When to Use" section
- **Frontmatter**: ✅ Valid
- **Required Fields**: name ✅, description ✅
- **Recommended Location**: skills/using-git-worktrees/SKILL.md

Example 2: Categorizing Command File

Input:

File: integration-scan.md
Content:
---
description: "Scan and categorize incoming files"
allowed-tools: ["Read", "Bash(find)"]
author: "Claude Command and Control"
version: "1.0"
---

# Integration Scan

## Purpose
...

Output:

### integration-scan.md
- **Category**: Command
- **Confidence**: High
- **Reasoning**: Filename uses verb-noun pattern, frontmatter has 'allowed-tools' and 'version'
- **Frontmatter**: ✅ Valid
- **Required Fields**: description ✅, allowed-tools ✅, author ✅, version ✅
- **Recommended Location**: .claude/commands/integration-scan.md

Example 3: Uncategorizable File

Input:

File: notes.md
Content:
# Random Notes

Some thoughts about the project...

Output:

### notes.md
- **Category**: Other
- **Confidence**: Low
- **Reasoning**: No frontmatter, no structural indicators, generic content
- **Frontmatter**: ❌ Missing
- **Required Fields**: N/A
- **Recommended Location**: Manual review required

Integration with Commands

Used By

```
/integration-scan
```
- Primary categorization logic
```
/integration-process
```
- Determines target directory
```
/integration-validate
```
- Validates category-specific structure

Usage Pattern

# In integration-scan command

For each file in /INTEGRATION/incoming:
  1. Read file content
  2. Use file-categorization skill
  3. Extract category and confidence
  4. Include in scan report
  5. Mark for processing if High confidence
  6. Flag for review if Medium/Low confidence

Category-Specific Validation Rules

Commands

✅ MUST have: description, allowed-tools, author, version
✅ SHOULD have: workflow steps, usage examples
⚠️ Check: Tool permissions not overly broad

Agents

✅ MUST have: Agent Identity, Core Responsibilities, Allowed Tools
✅ SHOULD have: Workflow Patterns, Context Management
⚠️ Check: Role clearly defined

Skills

✅ MUST have: name, description, "When to Use"
✅ SHOULD have: Examples, step-by-step process
⚠️ Check: Examples use real data (not placeholders)

Documentation

✅ MUST have: Clear title, structured content
✅ SHOULD have: Table of contents, cross-references
⚠️ Check: No executable workflows (should be in Command/Skill)

Error Handling

Malformed Frontmatter

Issue: YAML syntax error
Action: Note in categorization output
Category: "Other" with reason "Invalid frontmatter"
Recommendation: Fix YAML before processing

Conflicting Indicators

Issue: Filename says "command" but structure says "skill"
Action: Confidence = "Medium"
Reasoning: "Filename and content indicators conflict"
Recommendation: Manual review

Missing Content

Issue: File is empty or too short (<100 chars)
Action: Category = "Other"
Confidence: "Low"
Reasoning: "Insufficient content for categorization"

Testing Recommendations

Test with:

Typical files - Standard commands, agents, skills
Edge cases - Mixed indicators, missing frontmatter
Malformed files - Syntax errors, incomplete content
Ambiguous files - Could fit multiple categories

Expected accuracy:

High confidence: >95% correct
Medium confidence: >80% correct
Low confidence: Requires manual review

Version History

1.0 (2025-11-23)

Initial file categorization skill
Four-phase categorization algorithm
Integration with scan/process commands
Comprehensive validation rules

Skill Status: Production Ready Accuracy Target: >95% for High confidence categorizations Dependencies: None (standalone logic)