Claude-skill-registry full-stack-debugger

This skill should be used when debugging full-stack issues that span UI, backend, and database layers. It provides a systematic workflow to detect errors, analyze root causes, apply fixes iteratively, and verify solutions through automated server restarts and browser-based testing. Ideal for scenarios like failing schedulers, import errors, database issues, or API payload problems where issues originate in backend code but manifest in the UI.

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/full-stack-debugger" ~/.claude/skills/majiayu000-claude-skill-registry-full-stack-debugger && rm -rf "$T"
manifest: skills/data/full-stack-debugger/SKILL.md
source content

Full Stack Debugger

Overview

The Full Stack Debugger enables systematic debugging of issues across the entire application stack (UI/Frontend, Backend/API, Database/State). It combines browser testing, log analysis, code examination, and automated server restart/verification to iteratively identify and fix issues one at a time until the system is fully operational.

This skill uses a proven workflow: Detection → Analysis → Fix → Restart → Verification → Iteration to systematically resolve issues that developers encounter during development and testing.

When to Use This Skill

Trigger this skill when observing:

  • Error states in the UI (dashboard, buttons failing, status showing errors)
  • Repeated failures in backend logs (task execution failures, import errors, database errors)
  • Unexpected database state (rows showing failed status when they should succeed)
  • API endpoints returning errors or unexpected responses
  • Services failing to initialize or process tasks
  • Cascading failures across multiple components

Debugging Workflow

Phase 1: Detection

Detect errors from multiple sources:

Browser UI Detection:

  • Navigate to the affected page/feature in the browser
  • Check for error messages, red warning states, or disabled functionality
  • Read console error messages using DevTools
  • Note the specific UI state and what action triggered the error

Backend Log Detection:

  • Query recent error logs using
    tail -200 /path/to/logs/errors.log
  • Search for error patterns related to the issue using
    grep
  • Note error timestamps, error messages, and stack traces
  • Look for repeated errors (indicates systemic issue)

Database State Detection:

  • Query the database directly using sqlite3
  • Check status of recent tasks, transactions, or records
  • Look for failed, incomplete, or error states
  • Note which records are affected and what their states are

Example: When debugging a scheduler failure:

  1. Navigate to System Health dashboard
  2. Observe scheduler showing "0 done" or "X failed"
  3. Check
    /logs/errors.log
    for error messages
  4. Query
    queue_tasks
    table to see failed task records

Phase 2: Analysis

Analyze root causes by reading code and logs:

Code Analysis:

  • Read the error file/module indicated in error stack traces
  • Check imports - look for missing
    from X import Y
    statements
  • Check class names - verify instantiation matches actual class names
  • Look for syntax errors - unmatched quotes, unclosed parentheses
  • Check function signatures - ensure payloads match expected parameters
  • Read reference documentation (
    references/common_errors.md
    ) for error patterns

Log Analysis:

  • Extract error messages from logs
  • Look for patterns like
    'optional'
    (missing import),
    unterminated string
    (syntax error),
    'attribute'
    (wrong class name)
  • Trace error propagation backward to find the originating issue
  • Check timestamps - multiple errors at same time indicate batch failure

API/Payload Analysis:

  • Check what payload the API is sending to task handlers
  • Read the task handler code to see what fields it expects
  • Compare actual payload vs expected payload
  • Look for missing required fields

Example: When debugging "name 'Optional' is not defined":

  1. Find the file mentioned in error (
    analysis_executor.py
    )
  2. Read the imports section
  3. Notice
    Optional
    is used but not imported
  4. Check line 14:
    from typing import Dict, List, Any
    - missing
    Optional
  5. Fix: Add
    Optional
    to the import statement

Phase 3: Fix (One Issue at a Time)

Apply fixes one issue per iteration:

Before Fixing:

  • Verify this is the first/next issue to fix
  • Read the relevant code section carefully
  • Use the fix patterns from
    references/fix_templates.md

Common Fix Patterns:

  • Missing imports: Add to import statement (e.g.,
    from typing import Optional
    )
  • Wrong class name: Update import and instantiation to match actual class
  • Missing docstring quotes: Add opening
    """
    to docstring
  • Wrong payload fields: Add missing required fields to payload dictionary
  • Syntax errors: Fix unmatched quotes, parentheses, brackets

After Fixing:

  • Read back the changed code to verify syntax
  • Check the edit was correct (line numbers, indentation)
  • Only fix ONE issue, even if multiple exist - don't cascade fixes
  • Document what was changed in a clear comment

Example Fix:

# BEFORE
from typing import Dict, List, Any

# AFTER
from typing import Dict, List, Any, Optional

Phase 4: Restart (Automated)

Restart the backend server after each fix:

# Kill existing processes
lsof -ti:8000 | xargs kill -9 2>/dev/null

# Clear Python bytecode cache
find . -type d -name "__pycache__" -exec rm -rf {} + 2>/dev/null
find . -type f -name "*.pyc" -delete 2>/dev/null

# Restart backend
sleep 3 && python -m src.main --command web > /tmp/backend_restart.log 2>&1 &
sleep 10  # Wait for startup

# Verify health
curl -m 5 http://localhost:8000/api/health

Phase 5: Verification

Verify the fix worked through multiple checks:

Health Check:

  • Call
    /api/health
    endpoint
  • Verify
    "status": "healthy"
  • If still failing, check logs for new errors

Browser Verification:

  • Navigate to the affected UI page
  • Trigger the action that previously failed
  • Verify the error is gone
  • Check for new errors in console

Database Verification:

  • Query the affected records/tasks
  • Verify status changed from failed/error to success/completed
  • Check that metrics updated (e.g., scheduler shows "1 done" instead of "0 done")

Log Verification:

  • Check recent logs for the same error
  • Verify no new errors appeared
  • Look for success messages or "completed" status

Example:

  • Scheduler should show "1 done" instead of "0 done"
  • Task record should show status="completed" instead of "failed"
  • No error messages in logs
  • WebSocket shows healthy status in UI

Phase 6: Iteration

If issues remain, repeat the cycle:

  1. Continue if more issues exist:

    • Check logs for remaining errors
    • If yes, return to Phase 2 (Analysis)
    • Fix the next issue (Phase 3)
    • Restart (Phase 4)
    • Verify (Phase 5)
  2. Stop when all issues fixed:

    • All schedulers show completed execution counts
    • UI shows no error states
    • Logs show no error patterns
    • Tasks/records show success status
    • Full verification complete

Common Error Patterns

See

references/common_errors.md
for patterns to recognize:

  • Python syntax errors (unterminated strings, missing quotes)
  • Import errors (
    name 'X' is not defined
    ,
    cannot import name 'Y'
    )
  • Class/attribute errors (
    'dict' object has no attribute 'symbol'
    )
  • Type errors (passing wrong data type)
  • Payload/configuration errors (missing required fields)

Fix Templates

See

references/fix_templates.md
for ready-to-use fix patterns:

  • How to add missing imports
  • How to fix class name mismatches
  • How to fix docstring syntax
  • How to add missing payload fields
  • How to fix type errors

Tools Used

  • Playwright Browser Tools: Navigate UI, verify changes
  • Read/Grep Tools: Examine code and logs
  • Bash: Server restart, cache clearing, health checks
  • Edit Tool: Apply code fixes
  • Database Queries: Verify task/record state

MCP Tools Integration

Use robo-trader-dev MCP tools for 95%+ token-efficient debugging:

TaskMCP ToolToken SavingsUsage
Analyze error logs
mcp__robo-trader-dev__analyze_logs
98%Pattern detection with time windows
System health check
mcp__robo-trader-dev__check_system_health
97%Database, queues, API, disk status
Diagnose DB locks
mcp__robo-trader-dev__diagnose_database_locks
95%Correlate logs with code patterns
Queue monitoring
mcp__robo-trader-dev__queue_status
96%Real-time queue backlog analysis
Coordinator status
mcp__robo-trader-dev__coordinator_status
94%Init status, error details
Error pattern fix
mcp__robo-trader-dev__suggest_fix
90%Known pattern matching with examples
Read code files
mcp__robo-trader-dev__smart_file_read
85%Progressive context (summary/targeted/full)
Find related files
mcp__robo-trader-dev__find_related_files
88%Import/git/similarity analysis

Example debugging workflow:

# 1. Detect errors (MCP instead of tail/grep)
mcp__robo-trader-dev__analyze_logs(patterns=["ERROR", "TIMEOUT"], time_window="1h")

# 2. Check system health (MCP instead of curl loops)
mcp__robo-trader-dev__check_system_health(components=["database", "queues", "api_endpoints"])

# 3. Diagnose specific issue (MCP instead of sqlite3 + code reading)
mcp__robo-trader-dev__diagnose_database_locks(time_window="24h", include_code_references=True)

# 4. Get fix suggestions (MCP instead of manual pattern matching)
mcp__robo-trader-dev__suggest_fix(error_message="name 'Optional' is not defined", context_file="src/services/analyzer.py")

Integration with robo-trader architecture:

  • Queue operations: Use
    queue_status
    to monitor PORTFOLIO_SYNC, DATA_FETCHER, AI_ANALYSIS
  • Coordinator debugging: Use
    coordinator_status
    for BroadcastCoordinator, AIChatCoordinator init issues
  • Database access: Use
    query_portfolio
    or
    diagnose_database_locks
    instead of direct sqlite3 connections

Key Principles

  1. One issue at a time - Fix one problem per iteration to prevent cascading failures
  2. Verify immediately - Always restart and verify after each fix
  3. Multi-layer detection - Check UI, logs, and database for clues
  4. Iterative refinement - Continue until all issues resolved
  5. Automated restart - Always use clean restart (kill + cache clear + restart)
  6. Browser verification - Always test in actual UI, not just logs