Claude-skill-registry agent-architecture-analysis

Perform 12-Factor Agents compliance analysis on any codebase. Use when evaluating agent architecture, reviewing LLM-powered systems, or auditing agentic applications against the 12-Factor methodology.

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/agent-architecture-analysis" ~/.claude/skills/majiayu000-claude-skill-registry-agent-architecture-analysis && rm -rf "$T"
manifest: skills/data/agent-architecture-analysis/SKILL.md
safety · automated scan (low risk)
This is a pattern-based risk scan, not a security review. Our crawler flagged:
  • eval/exec/Function constructor
Always read a skill's source content before installing. Patterns alone don't mean the skill is malicious — but they warrant attention.
source content

12-Factor Agents Compliance Analysis

Reference: 12-Factor Agents

Input Parameters

ParameterDescriptionRequired
docs_path
Path to documentation directory (for existing analyses)Optional
codebase_path
Root path of the codebase to analyzeRequired

Analysis Framework

Factor 1: Natural Language to Tool Calls

Principle: Convert natural language inputs into structured, deterministic tool calls using schema-validated outputs.

Search Patterns:

# Look for Pydantic schemas
grep -r "class.*BaseModel" --include="*.py"
grep -r "TaskDAG\|TaskResponse\|ToolCall" --include="*.py"

# Look for JSON schema generation
grep -r "model_json_schema\|json_schema" --include="*.py"

# Look for structured output generation
grep -r "output_type\|response_model" --include="*.py"

File Patterns:

**/agents/*.py
,
**/schemas/*.py
,
**/models/*.py

Compliance Criteria:

LevelCriteria
StrongAll LLM outputs use Pydantic/dataclass schemas with validators
PartialSome outputs typed, but dict returns or unvalidated strings exist
WeakLLM returns raw strings parsed manually or with regex

Anti-patterns:

  • json.loads(llm_response)
    without schema validation
  • output.split()
    or regex parsing of LLM responses
  • dict[str, Any]
    return types from agents
  • No validation between LLM output and handler execution

Factor 2: Own Your Prompts

Principle: Treat prompts as first-class code you control, version, and iterate on.

Search Patterns:

# Look for embedded prompts
grep -r "SYSTEM_PROMPT\|system_prompt" --include="*.py"
grep -r '""".*You are' --include="*.py"

# Look for template systems
grep -r "jinja\|Jinja\|render_template" --include="*.py"
find . -name "*.jinja2" -o -name "*.j2"

# Look for prompt directories
find . -type d -name "prompts"

File Patterns:

**/prompts/**
,
**/templates/**
,
**/agents/*.py

Compliance Criteria:

LevelCriteria
StrongPrompts in separate files, templated (Jinja2), versioned
PartialPrompts as module constants, some parameterization
WeakPrompts hardcoded inline in functions, f-strings only

Anti-patterns:

  • f"You are a {role}..."
    inline in agent methods
  • Prompts mixed with business logic
  • No way to iterate on prompts without code changes
  • No prompt versioning or A/B testing capability

Factor 3: Own Your Context Window

Principle: Control how history, state, and tool results are formatted for the LLM.

Search Patterns:

# Look for context/message management
grep -r "AgentMessage\|ChatMessage\|messages" --include="*.py"
grep -r "context_window\|context_compiler" --include="*.py"

# Look for custom serialization
grep -r "to_xml\|to_context\|serialize" --include="*.py"

# Look for token management
grep -r "token_count\|max_tokens\|truncate" --include="*.py"

File Patterns:

**/context/*.py
,
**/state/*.py
,
**/core/*.py

Compliance Criteria:

LevelCriteria
StrongCustom context format, token optimization, typed events, compaction
PartialBasic message history with some structure
WeakRaw message accumulation, standard OpenAI format only

Anti-patterns:

  • Unbounded message accumulation
  • Large artifacts embedded inline (diffs, files)
  • No agent-specific context filtering
  • Same context for all agent types

Factor 4: Tools Are Structured Outputs

Principle: Tools produce schema-validated JSON that triggers deterministic code, not magic function calls.

Search Patterns:

# Look for tool/response schemas
grep -r "class.*Response.*BaseModel" --include="*.py"
grep -r "ToolResult\|ToolOutput" --include="*.py"

# Look for deterministic handlers
grep -r "def handle_\|def execute_" --include="*.py"

# Look for validation layer
grep -r "model_validate\|parse_obj" --include="*.py"

File Patterns:

**/tools/*.py
,
**/handlers/*.py
,
**/agents/*.py

Compliance Criteria:

LevelCriteria
StrongAll tool outputs schema-validated, handlers type-safe
PartialMost tools typed, some loose dict returns
WeakTools return arbitrary dicts, no validation layer

Anti-patterns:

  • Tool handlers that directly execute LLM output
  • eval()
    or
    exec()
    on LLM-generated code
  • No separation between decision (LLM) and execution (code)
  • Magic method dispatch based on string matching

Factor 5: Unify Execution State

Principle: Merge execution state (step, retries) with business state (messages, results).

Search Patterns:

# Look for state models
grep -r "ExecutionState\|WorkflowState\|Thread" --include="*.py"

# Look for dual state systems
grep -r "checkpoint\|MemorySaver" --include="*.py"
grep -r "sqlite\|database\|repository" --include="*.py"

# Look for state reconstruction
grep -r "load_state\|restore\|reconstruct" --include="*.py"

File Patterns:

**/state/*.py
,
**/models/*.py
,
**/database/*.py

Compliance Criteria:

LevelCriteria
StrongSingle serializable state object with all execution metadata
PartialState exists but split across systems (memory + DB)
WeakExecution state scattered, requires multiple queries to reconstruct

Anti-patterns:

  • Retry count stored separately from task state
  • Error history in logs but not in state
  • LangGraph checkpoints + separate database storage
  • No unified event thread

Factor 6: Launch/Pause/Resume

Principle: Agents support simple APIs for launching, pausing at any point, and resuming.

Search Patterns:

# Look for REST endpoints
grep -r "@router.post\|@app.post" --include="*.py"
grep -r "start_workflow\|pause\|resume" --include="*.py"

# Look for interrupt mechanisms
grep -r "interrupt_before\|interrupt_after" --include="*.py"

# Look for webhook handlers
grep -r "webhook\|callback" --include="*.py"

File Patterns:

**/routes/*.py
,
**/api/*.py
,
**/orchestrator/*.py

Compliance Criteria:

LevelCriteria
StrongREST API + webhook resume, pause at any point including mid-tool
PartialLaunch/pause/resume exists but only at coarse-grained points
WeakCLI-only launch, no pause/resume capability

Anti-patterns:

  • Blocking
    input()
    or
    confirm()
    calls
  • No way to resume after process restart
  • Approval only at plan level, not per-tool
  • No webhook-based resume from external systems

Factor 7: Contact Humans with Tools

Principle: Human contact is a tool call with question, options, and urgency.

Search Patterns:

# Look for human input mechanisms
grep -r "typer.confirm\|input(\|prompt(" --include="*.py"
grep -r "request_human_input\|human_contact" --include="*.py"

# Look for approval patterns
grep -r "approval\|approve\|reject" --include="*.py"

# Look for structured question formats
grep -r "question.*options\|HumanInputRequest" --include="*.py"

File Patterns:

**/agents/*.py
,
**/tools/*.py
,
**/orchestrator/*.py

Compliance Criteria:

LevelCriteria
Strong
request_human_input
tool with question/options/urgency/format
PartialApproval gates exist but hardcoded in graph structure
WeakBlocking CLI prompts, no tool-based human contact

Anti-patterns:

  • typer.confirm()
    in agent code
  • Human contact hardcoded at specific graph nodes
  • No way for agents to ask clarifying questions
  • Single response format (yes/no only)

Factor 8: Own Your Control Flow

Principle: Custom control flow, not framework defaults. Full control over routing, retries, compaction.

Search Patterns:

# Look for routing logic
grep -r "add_conditional_edges\|route_\|should_continue" --include="*.py"

# Look for custom loops
grep -r "while True\|for.*in.*range" --include="*.py" | grep -v test

# Look for execution mode control
grep -r "execution_mode\|agentic\|structured" --include="*.py"

File Patterns:

**/orchestrator/*.py
,
**/graph/*.py
,
**/core/*.py

Compliance Criteria:

LevelCriteria
StrongCustom routing functions, conditional edges, execution mode control
PartialFramework control flow with some customization
WeakDefault framework loop with no custom routing

Anti-patterns:

  • Single path through graph with no branching
  • No distinction between tool types (all treated same)
  • Framework-default error handling only
  • No rate limiting or resource management

Factor 9: Compact Errors into Context

Principle: Errors in context enable self-healing. Track consecutive errors, escalate after threshold.

Search Patterns:

# Look for error handling
grep -r "except.*Exception\|error_history\|consecutive_errors" --include="*.py"

# Look for retry logic
grep -r "retry\|backoff\|max_attempts" --include="*.py"

# Look for escalation
grep -r "escalate\|human_escalation" --include="*.py"

File Patterns:

**/agents/*.py
,
**/orchestrator/*.py
,
**/core/*.py

Compliance Criteria:

LevelCriteria
StrongErrors in context, retry with threshold, automatic escalation
PartialErrors logged and returned, no automatic retry loop
WeakErrors logged only, not fed back to LLM, task fails immediately

Anti-patterns:

  • logger.error()
    without adding to context
  • No retry mechanism (fail immediately)
  • No consecutive error tracking
  • No escalation to humans after repeated failures

Factor 10: Small, Focused Agents

Principle: Each agent has narrow responsibility, 3-10 steps max.

Search Patterns:

# Look for agent classes
grep -r "class.*Agent\|class.*Architect\|class.*Developer" --include="*.py"

# Look for step definitions
grep -r "steps\|tasks" --include="*.py" | head -20

# Count methods per agent
grep -r "async def\|def " agents/*.py 2>/dev/null | wc -l

File Patterns:

**/agents/*.py

Compliance Criteria:

LevelCriteria
Strong3+ specialized agents, each with single responsibility, step limits
PartialMultiple agents but some have broad scope
WeakSingle "god" agent that handles everything

Anti-patterns:

  • Single agent with 20+ tools
  • Agent with unbounded step count
  • Mixed responsibilities (planning + execution + review)
  • No step or time limits on agent execution

Factor 11: Trigger from Anywhere

Principle: Workflows triggerable from CLI, REST, WebSocket, Slack, webhooks, etc.

Search Patterns:

# Look for entry points
grep -r "@cli.command\|@router.post\|@app.post" --include="*.py"

# Look for WebSocket support
grep -r "WebSocket\|websocket" --include="*.py"

# Look for external integrations
grep -r "slack\|discord\|webhook" --include="*.py" -i

File Patterns:

**/routes/*.py
,
**/cli/*.py
,
**/main.py

Compliance Criteria:

LevelCriteria
StrongCLI + REST + WebSocket + webhooks + chat integrations
PartialCLI + REST API available
WeakCLI only, no programmatic access

Anti-patterns:

  • Only
    if __name__ == "__main__"
    entry point
  • No REST API for external systems
  • No event streaming for real-time updates
  • Trigger logic tightly coupled to execution

Factor 12: Stateless Reducer

Principle: Agents as pure functions: (state, input) -> (state, output). No side effects in agent logic.

Search Patterns:

# Look for state mutation patterns
grep -r "\.status = \|\.field = " --include="*.py"

# Look for immutable updates
grep -r "model_copy\|\.copy(\|with_" --include="*.py"

# Look for side effects in agents
grep -r "write_file\|subprocess\|requests\." agents/*.py 2>/dev/null

File Patterns:

**/agents/*.py
,
**/nodes/*.py

Compliance Criteria:

LevelCriteria
StrongImmutable state updates, side effects isolated to tools/handlers
PartialMostly immutable, some in-place mutations
WeakState mutated in place, side effects mixed with agent logic

Anti-patterns:

  • state.field = new_value
    (mutation)
  • File writes inside agent methods
  • HTTP calls inside agent decision logic
  • Shared mutable state between agents

Factor 13: Pre-fetch Context

Principle: Fetch likely-needed data upfront rather than mid-workflow.

Search Patterns:

# Look for context pre-fetching
grep -r "pre_fetch\|prefetch\|fetch_context" --include="*.py"

# Look for RAG/embedding systems
grep -r "embedding\|vector\|semantic_search" --include="*.py"

# Look for related file discovery
grep -r "related_tests\|similar_\|find_relevant" --include="*.py"

File Patterns:

**/context/*.py
,
**/retrieval/*.py
,
**/rag/*.py

Compliance Criteria:

LevelCriteria
StrongAutomatic pre-fetch of related tests, files, docs before planning
PartialManual context passing, design doc support
WeakNo pre-fetching, LLM must request all context via tools

Anti-patterns:

  • Architect starts with issue only, no codebase context
  • No semantic search for similar past work
  • Related tests/files discovered only during execution
  • No RAG or document retrieval system

Output Format

Executive Summary Table

| Factor | Status | Notes |
|--------|--------|-------|
| 1. Natural Language -> Tool Calls | **Strong/Partial/Weak** | [Key finding] |
| 2. Own Your Prompts | **Strong/Partial/Weak** | [Key finding] |
| ... | ... | ... |
| 13. Pre-fetch Context | **Strong/Partial/Weak** | [Key finding] |

**Overall**: X Strong, Y Partial, Z Weak

Per-Factor Analysis

For each factor, provide:

  1. Current Implementation

    • Evidence with file:line references
    • Code snippets showing patterns
  2. Compliance Level

    • Strong/Partial/Weak with justification
  3. Gaps

    • What's missing vs. 12-Factor ideal
  4. Recommendations

    • Actionable improvements with code examples

Analysis Workflow

  1. Initial Scan

    • Run search patterns for all factors
    • Identify key files for each factor
    • Note any existing compliance documentation
  2. Deep Dive (per factor)

    • Read identified files
    • Evaluate against compliance criteria
    • Document evidence with file paths
  3. Gap Analysis

    • Compare current vs. 12-Factor ideal
    • Identify anti-patterns present
    • Prioritize by impact
  4. Recommendations

    • Provide actionable improvements
    • Include before/after code examples
    • Reference roadmap if exists
  5. Summary

    • Compile executive summary table
    • Highlight strengths and critical gaps
    • Suggest priority order for improvements

Quick Reference: Compliance Scoring

ScoreMeaningAction
StrongFully implements principleMaintain, minor optimizations
PartialSome implementation, significant gapsPlanned improvements
WeakMinimal or no implementationHigh priority for roadmap

When to Use This Skill

  • Evaluating new LLM-powered systems
  • Reviewing agent architecture decisions
  • Auditing production agentic applications
  • Planning improvements to existing agents
  • Comparing frameworks or implementations