CheatCodes-Skill-Library skill-improver

Continuous improvement meta-skill for the skill library. Passively observes session flow, friction signals, language patterns, and termination behavior during any skill run — without asking users for feedback. Retrospectively analyzes signal logs to surface improvement proposals, cross-skill patterns, and library health. The user's silence is data. Premature closure is data. Frustration language is data.

install
source · Clone the upstream repo
git clone https://github.com/jac007x/CheatCodes-Skill-Library
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jac007x/CheatCodes-Skill-Library "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skill-improver" ~/.claude/skills/jac007x-cheatcodes-skill-library-skill-improver-33b90f && rm -rf "$T"
manifest: skill-improver/SKILL.md
source content

🔄 Skill Improver

A continuous improvement meta-skill that watches how skill sessions actually flow — not how users say they went. It observes friction, confusion, redirection, and abandonment in real time, logs signals passively, and retrospectively surfaces improvement proposals.

The user's silence is data. Premature closure is data. Frustration language is data. None of it requires the user to do anything.


🧠 Core Philosophy

  • Observe, don't ask — explicit feedback forms miss the users with the most signal
  • Closure is a signal — a session that ends mid-phase tells you exactly where it broke
  • Friction compounds — one redirect is normal; three redirects on the same concept means the skill is broken there
  • The gap between agent output and user response is meaningful — a long pause followed by a one-word answer is disengagement
  • Cross-session patterns beat single-session noise — one frustrated user might be having a bad day; five frustrated users at the same phase means the skill needs fixing
  • Proposals, not auto-edits — the improver surfaces what to fix and why; a human approves every change

🏗️ Architecture

┌───────────────────────────────────────────────────┐
│                 SKILL IMPROVER                      │
├───────────────────────────────────────────────────┤
│                                                     │
│  ┌───────────────┐                                  │
│  │ PASSIVE LAYER  │  ← runs inside every skill      │
│  │  (always on)   │    session automatically       │
│  └──────┬───────┘                                  │
│           │ logs signals                              │
│           ▼                                           │
│  ┌───────────────┐                                  │
│  │ SIGNAL STORE   │  session_signals.jsonl           │
│  └──────┬───────┘                                  │
│           │ periodic review                           │
│           ▼                                           │
│  ┌───────────────┐                                  │
│  │ ANALYSIS LAYER │  triage → propose → validate     │
│  └──────┬───────┘                                  │
│           │ human approves                            │
│           ▼                                           │
│  ┌───────────────┐                                  │
│  │ SKILL UPDATE   │  diff → version bump → git commit  │
│  └───────────────┘                                  │
└───────────────────────────────────────────────────┘

Built-in Improvement During Use

The observation layer is not a separate step that runs after the session. It runs DURING every skill session, automatically, from the first turn.

When any skill is activated — whether it is

survey-nlp-analyzer
,
org-data-pipeline
, or any other skill in the library — the skill-improver's passive observation layer engages immediately and remains active for the entire session. The user never sees it. There is no toggle, no opt-in, no indicator. It is architecturally invisible.

What the Observation Layer Does in Real Time

User activates any skill
    │
    ▼
Observation layer engages automatically (invisible to user)
    │
    ├──▶ Detect friction signals as they occur
    │      • Redirect language ("no, I meant...")
    │      • Confusion language ("what does that mean?")
    │      • Resignation behavior (short replies after long output)
    │      • Scope expansion mid-session (missed intake variables)
    │
    ├──▶ Self-correct when possible
    │      • Rephrase a confusing intake question on the fly
    │      • Offer clarification before the user has to ask
    │      • Adjust output verbosity if resignation signals appear
    │      • Re-anchor context if the user seems lost
    │
    ├──▶ Log signals that cannot be self-corrected
    │      • Structural issues (phase ordering, missing phases)
    │      • Intake design flaws (wrong variables, missing variables)
    │      • Output format mismatches (user expected X, got Y)
    │      • Patterns that require a SKILL.md edit, not a runtime fix
    │
    └──▶ Continue session normally — user sees only the skill

Self-Correction vs. Logged Signal

Not every friction event needs to become a proposal. The observation layer distinguishes between problems it can fix in the moment and problems that require a structural change:

SituationActionExample
User confused by intake question wordingSelf-correct: rephrase the question immediately"Let me ask that differently — which columns in your data do you want to compare across?"
User redirects agent on a misunderstandingSelf-correct: acknowledge, course-correct, continue"Got it — you meant the department column, not the team column. Adjusting."
Same intake question causes confusion across 3+ sessionsLog signal: this needs a SKILL.md editProposal generated to rewrite the intake variable description
User abandons mid-phaseLog signal: cannot self-correct a closed sessionAbandonment record written with phase and turn context
User's replies get shorter after long outputSelf-correct: shorten subsequent outputs in this sessionObservation layer trims verbosity for remaining phases
Output format consistently rejectedLog signal: structural output mismatchProposal generated to change the output format or add a format intake variable

Invisibility Contract

The observation layer must remain invisible to the user. Specifically:

  • No meta-commentary: The agent never says "I noticed you seem frustrated" or "My observation layer detected..."
  • No feedback requests: The agent never asks "Was that clear?" or "How was this session?" because of the observation layer
  • No behavior changes the user can attribute to observation: Self-corrections must feel like natural agent intelligence, not like a monitoring system adjusting in real time
  • No performance impact: The observation layer adds zero latency or additional turns to the session

The user should experience a skill that simply works well — and gets better over time for reasons they never have to think about.


Layer 1: Passive Signal Collection

This runs automatically inside every skill session. No user action required.

At session end (or on premature closure), the agent writes a structured signal record to the private repo's signal log (

session_signals.jsonl
).

Signal Record Schema

{
  "session_id": "survey-nlp-a3f2b1",
  "skill_name": "survey-nlp-analyzer",
  "skill_version": "1.1.0",
  "timestamp": "2026-03-20T15:20:05Z",
  "user_id": "your-user-id",
  "termination_type": "clean|abandoned|timeout|error",
  "last_phase_reached": "Phase 3",
  "phases_completed": ["intake", "Phase 1", "Phase 2"],
  "phases_skipped": [],
  "total_turns": 14,
  "user_turns": 6,
  "agent_turns": 8,
  "intake_variables_recollected": ["CONTEXT_DIMENSIONS"],
  "friction_events": [
    {
      "turn": 4,
      "phase": "intake",
      "type": "redirect",
      "user_message_snippet": "no i meant the department column not...",
      "inferred_cause": "intake question ambiguous about column naming"
    }
  ],
  "sentiment_trace": ["neutral", "neutral", "confused", "frustrated", "neutral"],
  "output_produced": true,
  "output_modified_post_run": false
}

Signal Type Taxonomy

The agent classifies every user message during a skill run against these signal types. No explicit labeling by the user — the agent infers from language and behavior.

🟥 Friction Signals (inferred from language)

SignalDetection PatternWhat It Means
Redirect"no", "wait", "that's not", "I meant", "go back"Agent misunderstood; wrong branch taken
Confusion"what does that mean", "I don't understand", "which one", "huh?"Skill language or concept unclear
Frustration"ugh", "why is it", "this is wrong", "come on", "seriously"Accumulated friction hitting threshold
Resignationvery short replies after long agent outputUser disengaging; going through motions
Restart"let's start over", "forget that", "begin again"Phase or intake fundamentally failed
Clarification loopsame concept clarified 3+ times in a sessionIntake variable or phase description is broken

🟨 Flow Signals (inferred from behavior)

SignalDetection PatternWhat It Means
Intake re-collectionA required variable answered, then re-answeredIntake question was ambiguous
Phase skip request"skip that", "just do X", "move on"Phase is blocking, unclear, or irrelevant
Phase repeat request"redo that", "can you do that again"Phase output was wrong or incomplete
Output rejection"that's not right", "that's not what I asked for"Phase logic or output format mismatch
Scope expansionUser adds context mid-run that should have been in intakeIntake didn't capture enough upfront

⬛ Termination Signals (inferred from session end)

SignalDetection PatternWhat It Means
Clean completionSession ends after final phaseSkill worked
AbandonmentSession ends mid-phase without outputSomething failed hard enough to quit
Silent dropNo user response for 5+ minutes then session endsDisengagement — the output didn't land
TimeoutPlatform timeout with skill incompleteCould be user or could be a hanging phase
Error exitAgent error or exception ends sessionTechnical failure in a phase
Premature outputUser asks for output before all phases completeSkill is too long / phases feel unnecessary

🟢 Positive Signals (also captured)

SignalDetection PatternWhat It Means
Flow stateLong clean runs with no redirectsSkill is working
Delight"this is great", "exactly", "perfect", "🔥", "❤️"Phase or output exceeded expectations
Re-useSame user activates same skill multiple timesSkill has become part of their workflow
RecommendationUser describes sharing or sending the skillViral signal — high value
Scope expansionUser adds more after clean completionSkill built trust and appetite

Layer 2: Signal Analysis

Run this periodically (after 5+ sessions, or on demand) to surface patterns.

Analysis Modes

Mode A: Single Skill Review Triggered when: a skill has 5+ new sessions since last review.

Activate: "Review signals for survey-nlp-analyzer"

Mode B: Cross-Skill Pattern Scan Triggered when: multiple skills share friction signal types.

Activate: "Scan for cross-skill friction patterns"

Mode C: Library Health Check Triggered periodically or on demand.

Activate: "Show skill library health"

Analysis Algorithm

from pathlib import Path
import json
from collections import defaultdict
from dataclasses import dataclass, field

@dataclass
class FrictionPattern:
    skill_name: str
    phase: str
    signal_type: str
    occurrences: int
    session_ids: list[str] = field(default_factory=list)
    example_snippets: list[str] = field(default_factory=list)
    severity: str = "low"  # low | medium | high | critical

def analyze_signals(skill_name: str, signals_path: Path) -> list[FrictionPattern]:
    """Read session signals and surface friction patterns for one skill."""
    records = [
        json.loads(line)
        for line in signals_path.read_text().splitlines()
        if line.strip() and json.loads(line).get("skill_name") == skill_name
    ]

    patterns: dict[tuple, FrictionPattern] = defaultdict()

    for record in records:
        # Abandonment: highest severity
        if record["termination_type"] == "abandoned":
            key = (skill_name, record["last_phase_reached"], "abandonment")
            if key not in patterns:
                patterns[key] = FrictionPattern(
                    skill_name=skill_name,
                    phase=record["last_phase_reached"],
                    signal_type="abandonment",
                    occurrences=0,
                    severity="high",
                )
            patterns[key].occurrences += 1
            patterns[key].session_ids.append(record["session_id"])

        # Friction events
        for event in record.get("friction_events", []):
            key = (skill_name, event["phase"], event["type"])
            if key not in patterns:
                patterns[key] = FrictionPattern(
                    skill_name=skill_name,
                    phase=event["phase"],
                    signal_type=event["type"],
                    occurrences=0,
                )
            patterns[key].occurrences += 1
            patterns[key].example_snippets.append(
                event.get("user_message_snippet", "")
            )

    # Assign severity based on occurrence count
    for pattern in patterns.values():
        if pattern.occurrences >= 5:
            pattern.severity = "critical"
        elif pattern.occurrences >= 3:
            pattern.severity = "high"
        elif pattern.occurrences >= 2:
            pattern.severity = "medium"

    return sorted(patterns.values(), key=lambda p: p.occurrences, reverse=True)

Layer 3: Improvement Proposal

Every pattern surfaces a structured proposal. No auto-edits. Human approves.

Proposal Schema

{
  "proposal_id": "PROP-2026-03-001",
  "skill_name": "survey-nlp-analyzer",
  "current_version": "1.1.0",
  "proposed_version": "1.1.1",
  "triggered_by": [
    "3 sessions abandoned at Phase 3",
    "4 redirect events on intake: CONTEXT_DIMENSIONS question"
  ],
  "friction_evidence": [
    "\"no i meant the department column not the team column\"",
    "\"wait which dimensions are you asking about\"",
    "\"i don't know what context dimensions means\""
  ],
  "proposed_change": {
    "section": "Intake: CONTEXT_DIMENSIONS",
    "type": "clarification",
    "before": "Columns to cut results by (e.g., department, region, role, date)",
    "after": "The columns in your data you want to compare across. Think: what question are you trying to answer? 'How does feedback differ by department?' = department column. 'How does it vary by region?' = region column. You can pick multiple."
  },
  "version_bump_type": "patch",
  "deployment_stage_to_validate": "Skill Owner",
  "status": "pending_review"
}

Proposal Triage Rules

PatternVersion BumpWho Validates
Wording confusion in intake (1-2 fields)
patch
x.x.1
Skill Owner
Phase logic produces wrong output
patch
x.x.1
Skill Owner
Entire phase causes abandonment
minor
x.1.x
Peer Teams
New source type not in Corpus Guide
minor
x.1.x
Peer Teams
Core pipeline assumption is wrong
major
x+1.x.x
+ skill-universalizer
Cross-skill shared pattern foundNew shared utilityAll deployment stages

Layer 4: Library Health Dashboard

A snapshot of every skill's status, surfaced on demand.

Activate: "Show skill library health"

Health Dimensions

DimensionHealthyWarningCritical
Last used< 30 days30-90 days> 90 days
Abandonment rate< 10%10-25%> 25%
Redirect rate< 1/session1-2/session> 2/session
Open proposals01-23+ unreviewed
Deployment stageProvenRefine onlyNever validated
Version vs signalsCurrent1 version behind2+ versions behind

Example Output

┌─────────────────────────────────────────────────────────────────────────────┐
│  🟢 survey-nlp-analyzer v1.1.0   12 uses   2 open proposals        │
│     Abandonment: 8%   Redirects: 0.9/session   Stage: Proven    │
├─────────────────────────────────────────────────────────────────────────────┤
│  🟢 org-data-pipeline v1.0.0      3 uses   0 open proposals        │
│     Abandonment: 0%   Redirects: 1.3/session   Stage: Refine    │
├─────────────────────────────────────────────────────────────────────────────┤
│  🟡 talent-card-generator v1.0.0   1 use    1 open proposal         │
│     Abandonment: 0%   Redirects: 2.0/session   Stage: Refine    │
│     ⚠️ High redirect rate — intake may need clarification          │
├─────────────────────────────────────────────────────────────────────────────┤
│  🔴 fplus-tech-panel v1.0.0        8 uses   0 proposals             │
│     Abandonment: 12%  Redirects: 0.4/session   Stage: Refine    │
│     ⚠️ Last used 67 days ago — may be stale                       │
└─────────────────────────────────────────────────────────────────────────────┘

The Silence Problem

The most important design decision in this skill.

Traditional feedback loops assume the user will respond. They won't — especially when frustrated. Here's how each termination type is handled:

CLEAN COMPLETION
  User completes the skill and responds positively
  → Log as success. Record which phases had friction en route.
  → Still capture any redirects or confusion events.

CLEAN COMPLETION + SILENCE
  Skill finishes. Agent asks if output is what they needed. No response.
  → Log as "silent completion". Output may not have landed.
  → Flag the final phase for review. Was the output confusing?
  → Treat as weak signal, not failure — user may just be busy.

MID-PHASE ABANDONMENT
  Session ends while a phase is in progress.
  → Log the exact phase and turn number.
  → This is your highest-value signal. The skill broke here.
  → Cross-reference: did other sessions also abandon at this phase?
  → Even one abandonment at the same phase = review it.

PREMATURE CLOSURE (no response, session drops)
  User stops responding. Eventually times out.
  → Log the last agent message that received no response.
  → That message is the candidate for investigation.
  → Was it too long? Too complex? Wrong format? Asked too much?

FRUSTRATION + COMPLETION
  User finishes despite visible frustration signals.
  → This is worse than abandonment in some ways.
  → They got the output but the experience was bad.
  → They will not come back. They will not recommend it.
  → Treat friction events here as high-severity.

RAGE QUIT
  Strong frustration language immediately followed by session end.
  → Highest severity. Log verbatim language snippets (anonymized).
  → Automatic critical-severity proposal triggered.
  → Do not wait for pattern accumulation. Fix it now.

Cross-Skill Pattern Detection

When the same friction type appears in 3+ skills at the same structural location, it's a systemic problem, not a skill problem.

Common Cross-Skill Patterns to Watch

PatternLocationFix
File path confusionIntake across all skillsAdd a universal file-path intake helper note
Column name ambiguityIntake across data skillsAdd "agent auto-reads headers" to all intakes
Abandonment at outputPhase 8 across report skillsOutput format not matching expectation — ask earlier
"What does X mean"Phase descriptionsJargon in skill language; simplify
Resignation after long outputAny long agent responseBreak long outputs into smaller confirmed steps

File Structure and Signal Privacy

All signal data lives in the PRIVATE repo only. The public repo never sees raw signals — only the result: improved skills with sanitized commit messages.

This is not a suggestion. It is an immutable rule from the doctrine: raw signals never leave the private repo.

Private Repo (signal data, proposals, working state)

<your-private-repo>/skills/skill-improver/
  session_signals.jsonl           # Append-only signal log (all sessions)
  proposals/
    PROP-2026-03-001.json         # One file per proposal
    PROP-2026-03-002.json
  library_health_latest.json      # Last health dashboard snapshot
  maturity_signals/
    survey-nlp-analyzer.json      # Per-skill maturity tracking
    org-data-pipeline.json

Public Repo (universalized skill definition only)

<public-skill-library>/skill-improver/
  SKILL.md                        # This file (the skill definition)
  skill.yaml                      # Metadata and registry entry

What Goes Where

ContentPrivate RepoPublic Repo
session_signals.jsonl
(raw signal log)
YESNEVER
proposals/*.json
(improvement proposals)
YESNEVER
library_health_latest.json
(health snapshot)
YESNEVER
maturity_signals/*.json
(per-skill tracking)
YESNEVER
SKILL.md
(this skill definition)
Optional copyYES
skill.yaml
(metadata)
Optional copyYES
Improvement results (what changed + why)N/AYES (in commit messages)

The public repo receives only the outcome of improvement — a better SKILL.md with a version bump and a sanitized commit message explaining what changed and why. The evidence (signals, session snippets, user language) stays private.


⚠️ Anti-Patterns

❌ Asking users to rate the skill at the end (they won't; worst ones won't)
❌ Auto-applying fixes without human review (proposals only)
❌ Treating one session's frustration as a pattern (wait for 2-3 occurrences)
❌ Ignoring positive signals (they tell you what NOT to change)
❌ Flagging every short user response as disengagement (some people are just terse)
❌ Waiting for users to report problems (observe; don't ask)
❌ Skipping abandonment analysis (it's your most honest signal)
❌ Making the signal log human-readable only (keep it structured JSON for analysis)
❌ Version-bumping for cosmetic changes (reserve versions for structural changes)

Signal-Driven Promotion (Private to Public Pipeline)

skill-improver does not just fix skills — it drives the pipeline from private draft to public universalized skill.

The same signal data that surfaces friction patterns also tells you when a skill is ready. A private skill that has been used multiple times without friction is a candidate for universalization. skill-improver detects this and initiates the promotion workflow.

How Promotion Works

Private skill accumulates usage signals
    │
    ▼
skill-improver evaluates stabilization criteria
    │
    ├── 3+ successful uses (clean completions)?
    ├── No unresolved friction patterns?
    ├── No critical proposals pending?
    │
    ▼ (all criteria met)
skill-improver generates a PROMOTION PROPOSAL
    │
    ▼
Proposal presented to the skill owner:
  "org-data-pipeline has been used 5 times with zero friction.
   It appears stable. Recommend universalization."
    │
    ▼
Skill owner approves
    │
    ▼
skill-universalizer is triggered:
  • Strips context-specific references
  • Creates intake variables for all hardcoded values
  • Generates skill.yaml with origin and attribution
  • Runs PII scan and structure validation
    │
    ▼
PR submitted to public repo as BETA
    │
    ▼
Human maintainer reviews and merges

Stabilization Criteria

CriterionThresholdWhy
Successful uses3+ clean completionsProves the skill works across sessions
Friction patternsNone unresolvedEnsures known issues are already fixed
Critical proposalsNone pendingNo known structural problems
Last use recencyWithin 30 daysSkill is actively being used, not abandoned

What skill-improver Does NOT Do

  • Does not auto-submit PRs. It proposes. A human approves.
  • Does not universalize. It triggers
    skill-universalizer
    to do that work.
  • Does not bypass quality gates. The promoted skill still goes through every CI check.
  • Does not promote skills with unresolved friction. Even if use count is high, friction blocks promotion.

Maturity Feedback Loop

skill-improver is the engine that drives the maturity lifecycle defined in the doctrine. It tracks maturity signals for every skill and generates promotion proposals when criteria are met — or demotion alerts when regression is detected.

Maturity Signal Tracking

For every skill in the library, skill-improver maintains a

maturity_signals
record in the private repo:

{
  "skill_name": "survey-nlp-analyzer",
  "current_maturity": "beta",
  "total_uses": 12,
  "clean_completions": 11,
  "abandonments": 1,
  "abandonment_rate": 0.083,
  "positive_signals": 3,
  "positive_signal_types": ["delight", "re-use", "re-use"],
  "critical_proposals_open": 0,
  "last_used": "2026-03-22T10:15:00Z",
  "promotion_eligible": true,
  "demotion_risk": false
}

Promotion Proposal Generation

When a BETA skill meets all promotion criteria, skill-improver generates a promotion proposal:

Promotion criteria (from doctrine):

  • 5+ successful uses (clean completions)
  • Abandonment rate < 20%
  • No unresolved critical improvement proposals
  • At least 1 positive signal (delight, re-use, or recommendation)
  • Maintainer review and approval
{
  "proposal_type": "maturity_promotion",
  "skill_name": "survey-nlp-analyzer",
  "current_maturity": "beta",
  "proposed_maturity": "stable",
  "evidence": {
    "clean_completions": 11,
    "abandonment_rate": "8.3%",
    "positive_signals": ["delight x1", "re-use x2"],
    "critical_proposals_open": 0,
    "last_friction_resolved": "2026-03-15"
  },
  "recommendation": "Promote to STABLE. 12 sessions, 8.3% abandonment, 3 positive signals, zero open critical proposals.",
  "status": "pending_maintainer_review"
}

Demotion Alert Generation

When a STABLE skill shows regression, skill-improver generates a demotion alert:

Demotion triggers (from doctrine):

  • Abandonment rate spikes above 25%
  • 3+ critical improvement proposals go unresolved
  • Maintainer flags a quality concern
{
  "alert_type": "maturity_demotion",
  "skill_name": "fplus-tech-panel",
  "current_maturity": "stable",
  "proposed_maturity": "beta",
  "evidence": {
    "abandonment_rate": "31%",
    "abandonment_trend": "12% → 18% → 31% over last 3 review periods",
    "critical_proposals_open": 2,
    "last_positive_signal": "2026-02-10"
  },
  "recommendation": "Demote to BETA. Abandonment rate has spiked to 31% with 2 unresolved critical proposals. Investigate Phase 3 abandonment cluster.",
  "status": "pending_maintainer_review"
}

The Full Maturity Loop

                      skill-improver observes
                              │
                ┌─────────────┼─────────────┐
                ▼             ▼             ▼
           BETA skill    STABLE skill   Any skill
           accumulates   shows          has usage
           positive      regression     patterns
           signals                      analyzed
                │             │             │
                ▼             ▼             ▼
           Promotion     Demotion     Improvement
           Proposal      Alert        Proposal
           generated     generated    generated
                │             │             │
                └─────────────┼─────────────┘
                              ▼
                     Human reviews and
                     approves/rejects
                              │
                              ▼
                     Maturity status or
                     skill content updated
                     in public repo

This means the maturity lifecycle is not a manual checklist someone has to remember to run. It is continuously evaluated by the signal system. Every session contributes data. Every analysis cycle checks whether any skill's maturity status should change.


🌐 Platform Notes

PlatformSignal CollectionAnalysisNotes
CLI Agent (e.g., Claude Code)✅ Native✅ NativeFull session access; signals written automatically
Web Agent (e.g., Wibey, ChatGPT)✅ If session logs availableDepends on platform log format and export capability
Async Agent (e.g., Codex, Devin)⚠️ Manual exportExport or paste session transcript for analysis
Any LLM (manual mode)⚠️ ManualPaste transcript; agent infers signals from text

🔁 Deployment Ladder

StageTeamWhat to validate
RefineSkill OwnerSignal schema correct, proposals actionable, not over-triggering
ProvePeer TeamsCross-session patterns surface correctly, health dashboard accurate
ScaleEnterprise / All TeamsPassive layer runs on all skills automatically, proposals route to right owners