Skilllibrary process-doctor

Diagnose and repair broken agent processes — doom loops, workflow stalls, status-over-evidence routing, context rot, conflicting instructions. Use when the user says "agent keeps failing", "workflow stalled", "process broken", "doom loop", "agent repeating", or a specific named process produces wrong results. Do not use for workflow-observability (status reporting), review-audit-bridge (code review findings), or error-handling (code-level error design).

install
source · Clone the upstream repo
git clone https://github.com/merceralex397-collab/skilllibrary
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/merceralex397-collab/skilllibrary "$T" && mkdir -p ~/.claude/skills && cp -r "$T/02-generated-repo-core/process-doctor" ~/.claude/skills/merceralex397-collab-skilllibrary-process-doctor && rm -rf "$T"
manifest: 02-generated-repo-core/process-doctor/SKILL.md
source content

Purpose

Diagnoses broken or ineffective agent processes within a specific repository. Identifies what's causing repeated failures, inconsistent outputs, or workflow stalls. Unlike repo-process-doctor (which audits infrastructure), this skill focuses on diagnosing why a specific process isn't working as expected.

When to use this skill

Use when:

  • An agent keeps making the same mistake
  • Workflow completes but output is wrong
  • Process worked before but stopped working
  • Need to understand why a specific failure occurred

Do NOT use when:

  • Setting up new infrastructure (use repo-process-doctor)
  • The issue is with code logic, not process
  • First-time debugging (try normal debugging first)

Operating procedure

1. Identify the failing process

## Process Under Investigation
- Name: [e.g., "ticket execution", "PR review", "deployment"]
- Expected behavior: [what should happen]
- Actual behavior: [what's happening instead]
- Frequency: [always | sometimes | once]

2. Gather diagnostic evidence

# Recent relevant commits
git log --oneline -20 --all | grep -i "[process name]"

# Recent changes to process files
git log --oneline -10 -- AGENTS.md tickets/ .opencode/

# Check for error patterns in logs
grep -r "error\|fail\|Error\|FAIL" .opencode/logs/ 2>/dev/null | tail -20

3. Common failure patterns

Pattern: Doom loop

Symptom: Agent repeats same action indefinitely Diagnosis: Check for missing exit condition or success detection

# Look for repeated similar commits
git log --oneline -20 | sort | uniq -d

Fix: Add explicit success criteria and loop limits

Pattern: Status-over-evidence routing

Symptom: Agent marks tasks done without verification Diagnosis: Check ticket acceptance criteria vs. actual evidence

# Find tickets marked done
grep -l "status: done" tickets/*.md | while read f; do
  echo "=== $f ==="
  grep -A5 "Acceptance Evidence" "$f" || echo "NO EVIDENCE SECTION"
done

Fix: Require evidence artifacts before status transitions

Pattern: Impossible read-only delegation

Symptom: Research agent can't complete because it needs to create files Diagnosis: Check agent permissions vs. required actions

grep -A5 "permissions:" .opencode/agents/*.yaml

Fix: Either expand permissions or change delegation target

Pattern: Context rot

Symptom: Agent makes decisions inconsistent with project context Diagnosis: Check when context was last loaded

# Check if context files changed since session start
git diff --name-only HEAD~10 -- docs/ AGENTS.md README.md

Fix: Reload project context; add context refresh trigger

Pattern: Conflicting instructions

Symptom: Agent alternates between different behaviors Diagnosis: Check for contradictory guidance

# Find potential conflicts
grep -r "must\|always\|never" AGENTS.md docs/*.md .opencode/skills/*.md | sort

Fix: Resolve conflicts by establishing clear precedence

4. Build diagnosis report

# Process Diagnosis: [Process Name]
Date: [ISO date]

## Symptom
[What's going wrong]

## Root Cause
[Identified pattern from step 3]

## Evidence
- [Specific file:line or log entry]
- [Command output showing the issue]

## Fix
[Specific change to make]

## Verification
[How to confirm the fix worked]

5. Apply fix and verify

# Make the fix
[edit files as needed]

# Verify fix
[run the process again]
[check for expected behavior]

6. Document prevention

Add to project knowledge base:

## Learned: [date] - [issue summary]
- Problem: [brief description]
- Fix: [what was done]
- Prevention: [how to avoid in future]

Output defaults

# Process Diagnosis Report

## Issue: [Short description]
## Status: [DIAGNOSED | FIXED | NEEDS_ESCALATION]

## Root Cause
[Pattern identified]

## Fix Applied
[Changes made, or recommended changes]

## Verification
- [ ] Process ran successfully after fix
- [ ] No recurrence in subsequent runs

References

  • Common agent failure modes: doom loops, status-over-evidence, context rot
  • Process doctor is reactive; repo-process-doctor is proactive audit

Failure handling

  • Cannot reproduce issue: Document conditions under which it occurred, add monitoring
  • Multiple possible causes: Test each hypothesis one at a time
  • Fix requires structural changes: Escalate to repo-process-doctor for infrastructure repair
  • Issue is in external dependency: Document workaround and file upstream issue