Claude-skill-registry learning-systems

Implicit feedback scoring, confidence decay, and anti-pattern detection. Use when understanding how the swarm plugin learns from outcomes, implementing learning loops, or debugging why patterns are being promoted or deprecated. Unique to opencode-swarm-plugin.

install

source · Clone the upstream repo

git clone https://github.com/majiayu000/claude-skill-registry

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/learning-systems" ~/.claude/skills/majiayu000-claude-skill-registry-learning-systems && rm -rf "$T"

manifest: skills/data/learning-systems/SKILL.md

Learning Systems

The swarm plugin learns from task outcomes to improve decomposition quality over time. Three interconnected systems track pattern effectiveness: implicit feedback scoring, confidence decay, and pattern maturity progression.

Implicit Feedback Scoring

Convert task outcomes into learning signals without explicit user feedback.

What Gets Scored

Duration signals:

Fast (<5 min) = helpful (1.0)
Medium (5-30 min) = neutral (0.6)
Slow (>30 min) = harmful (0.2)

Error signals:

0 errors = helpful (1.0)
1-2 errors = neutral (0.6)
3+ errors = harmful (0.2)

Retry signals:

0 retries = helpful (1.0)
1 retry = neutral (0.7)
2+ retries = harmful (0.3)

Success signal:

Success = 1.0 (40% weight)
Failure = 0.0

Weighted Score Calculation

rawScore = success * 0.4 + duration * 0.2 + errors * 0.2 + retries * 0.2;

Thresholds:

rawScore >= 0.7 → helpful
rawScore <= 0.4 → harmful
0.4 < rawScore < 0.7 → neutral

Recording Outcomes

Call

swarm_record_outcome

after subtask completion:

swarm_record_outcome({
  bead_id: "bd-123.1",
  duration_ms: 180000, // 3 minutes
  error_count: 0,
  retry_count: 0,
  success: true,
  files_touched: ["src/auth.ts"],
  strategy: "file-based",
});

Fields tracked:

```
bead_id
```
- subtask identifier
```
duration_ms
```
- time from start to completion
```
error_count
```
- errors encountered (from ErrorAccumulator)
```
retry_count
```
- number of retry attempts
```
success
```
- whether subtask completed successfully
```
files_touched
```
- modified file paths
```
strategy
```
- decomposition strategy used (optional)
```
failure_mode
```
- classification if success=false (optional)
```
failure_details
```
- error context (optional)

Confidence Decay

Evaluation criteria weights fade unless revalidated. Prevents stale patterns from dominating future decompositions.

Half-Life Formula

decayed_value = raw_value * 0.5^(age_days / 90)

Decay timeline:

Day 0: 100% weight
Day 90: 50% weight
Day 180: 25% weight
Day 270: 12.5% weight

Criterion Weight Calculation

Aggregate decayed feedback events:

helpfulSum = sum(helpful_events.map((e) => e.raw_value * decay(e.timestamp)));
harmfulSum = sum(harmful_events.map((e) => e.raw_value * decay(e.timestamp)));
weight = max(0.1, helpfulSum / (helpfulSum + harmfulSum));

Weight floor: minimum 0.1 prevents complete zeroing

Revalidation

Recording new feedback resets decay timer for that criterion:

{
  criterion: "type_safe",
  weight: 0.85,
  helpful_count: 12,
  harmful_count: 3,
  last_validated: "2024-12-12T00:00:00Z",  // Reset on new feedback
  half_life_days: 90,
}

When Criteria Get Deprecated

total = helpful_count + harmful_count;
harmfulRatio = harmful_count / total;

if (total >= 3 && harmfulRatio > 0.3) {
  // Deprecate criterion - reduce impact to 0
}

Pattern Maturity States

Patterns progress through lifecycle based on feedback accumulation:

candidate → established → proven (or deprecated)

State Transitions

candidate (initial state):

Total feedback < 3 events
Not enough data to judge
Multiplier: 0.5x

established:

Total feedback >= 3 events
Has track record but not proven
Multiplier: 1.0x

proven:

Decayed helpful >= 5 AND
Harmful ratio < 15%
Multiplier: 1.5x

deprecated:

Harmful ratio > 30% AND
Total feedback >= 3 events
Multiplier: 0x (excluded)

Decay Applied to State Calculation

State determination uses decayed counts, not raw counts:

const { decayedHelpful, decayedHarmful } =
  calculateDecayedCounts(feedbackEvents);
const total = decayedHelpful + decayedHarmful;
const harmfulRatio = decayedHarmful / total;

// State logic applies to decayed values

Old feedback matters less. Pattern must maintain recent positive signal to stay proven.

Manual State Changes

Promote to proven:

promotePattern(maturity); // External validation confirms effectiveness

Deprecate:

deprecatePattern(maturity, "Causes file conflicts in 80% of cases");

Cannot promote deprecated patterns. Must reset.

Multipliers in Decomposition

Apply maturity multiplier to pattern scores:

const multipliers = {
  candidate: 0.5,
  established: 1.0,
  proven: 1.5,
  deprecated: 0,
};

pattern_score = base_score * multipliers[maturity.state];

Proven patterns get 50% boost, deprecated patterns excluded entirely.

Anti-Pattern Inversion

Failed patterns auto-convert to anti-patterns at >60% failure rate.

Inversion Threshold

const total = pattern.success_count + pattern.failure_count;

if (total >= 3 && pattern.failure_count / total >= 0.6) {
  invertToAntiPattern(pattern, reason);
}

Minimum observations: 3 total (prevents hasty inversion) Failure ratio: 60% (3+ failures in 5 attempts)

Inversion Process

Original pattern:

{
  id: "pattern-123",
  content: "Split by file type",
  kind: "pattern",
  is_negative: false,
  success_count: 2,
  failure_count: 5,
}

Inverted anti-pattern:

{
  id: "anti-pattern-123",
  content: "AVOID: Split by file type. Failed 5/7 times (71% failure rate)",
  kind: "anti_pattern",
  is_negative: true,
  success_count: 2,
  failure_count: 5,
  reason: "Failed 5/7 times (71% failure rate)",
}

Recording Observations

Track pattern outcomes to accumulate success/failure counts:

recordPatternObservation(
  pattern,
  success: true,  // or false
  beadId: "bd-123.1",
)

// Returns:
{
  pattern: updatedPattern,
  inversion?: {
    original: pattern,
    inverted: antiPattern,
    reason: "Failed 5/7 times (71% failure rate)",
  }
}

Pattern Extraction

Auto-detect strategies from decomposition descriptions:

extractPatternsFromDescription(
  "We'll split by file type, one file per subtask",
);

// Returns: ["Split by file type", "One file per subtask"]

Detected strategies:

Split by file type
Split by component
Split by layer (UI/logic/data)
Split by feature
One file per subtask
Handle shared types first
Separate API routes
Tests alongside implementation
Tests in separate subtask
Maximize parallelization
Sequential execution order
Respect dependency chain

Using Anti-Patterns in Prompts

Format for decomposition prompt inclusion:

formatAntiPatternsForPrompt(patterns);

Output:

## Anti-Patterns to Avoid

Based on past failures, avoid these decomposition strategies:

- AVOID: Split by file type. Failed 12/15 times (80% failure rate)
- AVOID: One file per subtask. Failed 8/10 times (80% failure rate)

Error Accumulator

Track errors during subtask execution for retry prompts and outcome scoring.

Error Types

type ErrorType =
  | "validation" // Schema/type errors
  | "timeout" // Task exceeded time limit
  | "conflict" // File reservation conflicts
  | "tool_failure" // Tool invocation failed
  | "unknown"; // Unclassified

Recording Errors

errorAccumulator.recordError(
  beadId: "bd-123.1",
  errorType: "validation",
  message: "Type error in src/auth.ts",
  options: {
    stack_trace: "...",
    tool_name: "typecheck",
    context: "After adding OAuth types",
  }
)

Generating Error Context

Format accumulated errors for retry prompts:

const context = await errorAccumulator.getErrorContext(
  beadId: "bd-123.1",
  includeResolved: false,
)

Output:

## Previous Errors

The following errors were encountered during execution:

### validation (2 errors)

- **Type error in src/auth.ts**
  - Context: After adding OAuth types
  - Tool: typecheck
  - Time: 12/12/2024, 10:30 AM

- **Missing import in src/session.ts**
  - Tool: typecheck
  - Time: 12/12/2024, 10:35 AM

**Action Required**: Address these errors before proceeding. Consider:

- What caused each error?
- How can you prevent similar errors?
- Are there patterns across error types?

Resolving Errors

Mark errors resolved after fixing:

await errorAccumulator.resolveError(errorId);

Resolved errors excluded from retry context by default.

Error Statistics

Get error counts for outcome tracking:

const stats = await errorAccumulator.getErrorStats("bd-123.1")

// Returns:
{
  total: 5,
  unresolved: 2,
  by_type: {
    validation: 3,
    timeout: 1,
    tool_failure: 1,
  }
}

Use

total

for

error_count

in outcome signals.

Using the Learning System

Integration Points

1. During decomposition (swarm_plan_prompt):

Query CASS for similar tasks
Load pattern maturity records
Include proven patterns in prompt
Exclude deprecated patterns

2. During execution:

ErrorAccumulator tracks errors
Record retry attempts
Track duration from start to completion

3. After completion (swarm_complete):

Record outcome signals
Score implicit feedback
Update pattern observations
Check for anti-pattern inversions
Update maturity states

Full Workflow Example

// 1. Decomposition phase
const cass_results = cass_search({ query: "user authentication", limit: 5 });
const patterns = loadPatterns(); // Get maturity records
const prompt = swarm_plan_prompt({
  task: "Add OAuth",
  context: formatPatternsWithMaturityForPrompt(patterns),
  query_cass: true,
});

// 2. Execution phase
const errorAccumulator = new ErrorAccumulator();
const startTime = Date.now();

try {
  // Work happens...
  await implement_subtask();
} catch (error) {
  await errorAccumulator.recordError(
    bead_id,
    classifyError(error),
    error.message,
  );
  retryCount++;
}

// 3. Completion phase
const duration = Date.now() - startTime;
const errorStats = await errorAccumulator.getErrorStats(bead_id);

swarm_record_outcome({
  bead_id,
  duration_ms: duration,
  error_count: errorStats.total,
  retry_count: retryCount,
  success: true,
  files_touched: modifiedFiles,
  strategy: "file-based",
});

// 4. Learning updates
const scored = scoreImplicitFeedback({
  bead_id,
  duration_ms: duration,
  error_count: errorStats.total,
  retry_count: retryCount,
  success: true,
  timestamp: new Date().toISOString(),
  strategy: "file-based",
});

// Update patterns
for (const pattern of extractedPatterns) {
  const { pattern: updated, inversion } = recordPatternObservation(
    pattern,
    scored.type === "helpful",
    bead_id,
  );

  if (inversion) {
    console.log(`Pattern inverted: ${inversion.reason}`);
    storeAntiPattern(inversion.inverted);
  }
}

Configuration Tuning

Adjust thresholds based on project characteristics:

const learningConfig = {
  halfLifeDays: 90, // Decay speed
  minFeedbackForAdjustment: 3, // Min observations for weight adjustment
  maxHarmfulRatio: 0.3, // Max harmful % before deprecating criterion
  fastCompletionThresholdMs: 300000, // 5 min = fast
  slowCompletionThresholdMs: 1800000, // 30 min = slow
  maxErrorsForHelpful: 2, // Max errors before marking harmful
};

const antiPatternConfig = {
  minObservations: 3, // Min before inversion
  failureRatioThreshold: 0.6, // 60% failure triggers inversion
  antiPatternPrefix: "AVOID: ",
};

const maturityConfig = {
  minFeedback: 3, // Min for leaving candidate state
  minHelpful: 5, // Decayed helpful threshold for proven
  maxHarmful: 0.15, // Max 15% harmful for proven
  deprecationThreshold: 0.3, // 30% harmful triggers deprecation
  halfLifeDays: 90,
};

Debugging Pattern Issues

Why is pattern not proven?

Check decayed counts:

const feedback = await getFeedback(patternId);
const { decayedHelpful, decayedHarmful } = calculateDecayedCounts(feedback);

console.log({ decayedHelpful, decayedHarmful });
// Need: decayedHelpful >= 5 AND harmfulRatio < 0.15

Why was pattern inverted?

Check observation counts:

const total = pattern.success_count + pattern.failure_count;
const failureRatio = pattern.failure_count / total;

console.log({ total, failureRatio });
// Inverts if: total >= 3 AND failureRatio >= 0.6

Why is criterion weight low?

Check feedback events:

const events = await getFeedbackByCriterion("type_safe");
const weight = calculateCriterionWeight(events);

console.log(weight);
// Shows: helpful vs harmful counts, last_validated date

Storage Interfaces

FeedbackStorage

Persist feedback events for criterion weight calculation:

interface FeedbackStorage {
  store(event: FeedbackEvent): Promise<void>;
  getByCriterion(criterion: string): Promise<FeedbackEvent[]>;
  getByBead(beadId: string): Promise<FeedbackEvent[]>;
  getAll(): Promise<FeedbackEvent[]>;
}

ErrorStorage

Persist errors for retry prompts:

interface ErrorStorage {
  store(entry: ErrorEntry): Promise<void>;
  getByBead(beadId: string): Promise<ErrorEntry[]>;
  getUnresolvedByBead(beadId: string): Promise<ErrorEntry[]>;
  markResolved(id: string): Promise<void>;
  getAll(): Promise<ErrorEntry[]>;
}

PatternStorage

Persist decomposition patterns:

interface PatternStorage {
  store(pattern: DecompositionPattern): Promise<void>;
  get(id: string): Promise<DecompositionPattern | null>;
  getAll(): Promise<DecompositionPattern[]>;
  getAntiPatterns(): Promise<DecompositionPattern[]>;
  getByTag(tag: string): Promise<DecompositionPattern[]>;
  findByContent(content: string): Promise<DecompositionPattern[]>;
}

MaturityStorage

Persist pattern maturity records:

interface MaturityStorage {
  store(maturity: PatternMaturity): Promise<void>;
  get(patternId: string): Promise<PatternMaturity | null>;
  getAll(): Promise<PatternMaturity[]>;
  getByState(state: MaturityState): Promise<PatternMaturity[]>;
  storeFeedback(feedback: MaturityFeedback): Promise<void>;
  getFeedback(patternId: string): Promise<MaturityFeedback[]>;
}

In-memory implementations provided for testing. Production should use persistent storage (file-based JSONL or SQLite).