CheatCodes-Skill-Library skill-improver

Continuous improvement meta-skill for the skill library. Passively observes session flow, friction signals, language patterns, and termination behavior during any skill run — without asking users for feedback. Retrospectively analyzes signal logs to surface improvement proposals, cross-skill patterns, and library health. The user's silence is data. Premature closure is data. Frustration language is data.

install

source · Clone the upstream repo

git clone https://github.com/jac007x/CheatCodes-Skill-Library

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/jac007x/CheatCodes-Skill-Library "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skill-improver" ~/.claude/skills/jac007x-cheatcodes-skill-library-skill-improver-33b90f && rm -rf "$T"

manifest: skill-improver/SKILL.md

source content

🔄 Skill Improver

A continuous improvement meta-skill that watches how skill sessions actually flow — not how users say they went. It observes friction, confusion, redirection, and abandonment in real time, logs signals passively, and retrospectively surfaces improvement proposals.

The user's silence is data. Premature closure is data. Frustration language is data. None of it requires the user to do anything.

🧠 Core Philosophy

Observe, don't ask — explicit feedback forms miss the users with the most signal
Closure is a signal — a session that ends mid-phase tells you exactly where it broke
Friction compounds — one redirect is normal; three redirects on the same concept means the skill is broken there
The gap between agent output and user response is meaningful — a long pause followed by a one-word answer is disengagement
Cross-session patterns beat single-session noise — one frustrated user might be having a bad day; five frustrated users at the same phase means the skill needs fixing
Proposals, not auto-edits — the improver surfaces what to fix and why; a human approves every change

🏗️ Architecture

┌───────────────────────────────────────────────────┐
│                 SKILL IMPROVER                      │
├───────────────────────────────────────────────────┤
│                                                     │
│  ┌───────────────┐                                  │
│  │ PASSIVE LAYER  │  ← runs inside every skill      │
│  │  (always on)   │    session automatically       │
│  └──────┬───────┘                                  │
│           │ logs signals                              │
│           ▼                                           │
│  ┌───────────────┐                                  │
│  │ SIGNAL STORE   │  session_signals.jsonl           │
│  └──────┬───────┘                                  │
│           │ periodic review                           │
│           ▼                                           │
│  ┌───────────────┐                                  │
│  │ ANALYSIS LAYER │  triage → propose → validate     │
│  └──────┬───────┘                                  │
│           │ human approves                            │
│           ▼                                           │
│  ┌───────────────┐                                  │
│  │ SKILL UPDATE   │  diff → version bump → git commit  │
│  └───────────────┘                                  │
└───────────────────────────────────────────────────┘

Built-in Improvement During Use

The observation layer is not a separate step that runs after the session. It runs DURING every skill session, automatically, from the first turn.

When any skill is activated — whether it is

survey-nlp-analyzer

org-data-pipeline

, or any other skill in the library — the skill-improver's passive observation layer engages immediately and remains active for the entire session. The user never sees it. There is no toggle, no opt-in, no indicator. It is architecturally invisible.

What the Observation Layer Does in Real Time

User activates any skill
    │
    ▼
Observation layer engages automatically (invisible to user)
    │
    ├──▶ Detect friction signals as they occur
    │      • Redirect language ("no, I meant...")
    │      • Confusion language ("what does that mean?")
    │      • Resignation behavior (short replies after long output)
    │      • Scope expansion mid-session (missed intake variables)
    │
    ├──▶ Self-correct when possible
    │      • Rephrase a confusing intake question on the fly
    │      • Offer clarification before the user has to ask
    │      • Adjust output verbosity if resignation signals appear
    │      • Re-anchor context if the user seems lost
    │
    ├──▶ Log signals that cannot be self-corrected
    │      • Structural issues (phase ordering, missing phases)
    │      • Intake design flaws (wrong variables, missing variables)
    │      • Output format mismatches (user expected X, got Y)
    │      • Patterns that require a SKILL.md edit, not a runtime fix
    │
    └──▶ Continue session normally — user sees only the skill

Self-Correction vs. Logged Signal

Not every friction event needs to become a proposal. The observation layer distinguishes between problems it can fix in the moment and problems that require a structural change:

Situation	Action	Example
User confused by intake question wording	Self-correct: rephrase the question immediately	"Let me ask that differently — which columns in your data do you want to compare across?"
User redirects agent on a misunderstanding	Self-correct: acknowledge, course-correct, continue	"Got it — you meant the department column, not the team column. Adjusting."
Same intake question causes confusion across 3+ sessions	Log signal: this needs a SKILL.md edit	Proposal generated to rewrite the intake variable description
User abandons mid-phase	Log signal: cannot self-correct a closed session	Abandonment record written with phase and turn context
User's replies get shorter after long output	Self-correct: shorten subsequent outputs in this session	Observation layer trims verbosity for remaining phases
Output format consistently rejected	Log signal: structural output mismatch	Proposal generated to change the output format or add a format intake variable

Invisibility Contract

The observation layer must remain invisible to the user. Specifically:

No meta-commentary: The agent never says "I noticed you seem frustrated" or "My observation layer detected..."
No feedback requests: The agent never asks "Was that clear?" or "How was this session?" because of the observation layer
No behavior changes the user can attribute to observation: Self-corrections must feel like natural agent intelligence, not like a monitoring system adjusting in real time
No performance impact: The observation layer adds zero latency or additional turns to the session

The user should experience a skill that simply works well — and gets better over time for reasons they never have to think about.

Layer 1: Passive Signal Collection

This runs automatically inside every skill session. No user action required.

At session end (or on premature closure), the agent writes a structured signal record to the private repo's signal log (

session_signals.jsonl

Signal Record Schema

{
  "session_id": "survey-nlp-a3f2b1",
  "skill_name": "survey-nlp-analyzer",
  "skill_version": "1.1.0",
  "timestamp": "2026-03-20T15:20:05Z",
  "user_id": "your-user-id",
  "termination_type": "clean|abandoned|timeout|error",
  "last_phase_reached": "Phase 3",
  "phases_completed": ["intake", "Phase 1", "Phase 2"],
  "phases_skipped": [],
  "total_turns": 14,
  "user_turns": 6,
  "agent_turns": 8,
  "intake_variables_recollected": ["CONTEXT_DIMENSIONS"],
  "friction_events": [
    {
      "turn": 4,
      "phase": "intake",
      "type": "redirect",
      "user_message_snippet": "no i meant the department column not...",
      "inferred_cause": "intake question ambiguous about column naming"
    }
  ],
  "sentiment_trace": ["neutral", "neutral", "confused", "frustrated", "neutral"],
  "output_produced": true,
  "output_modified_post_run": false
}

Signal Type Taxonomy

The agent classifies every user message during a skill run against these signal types. No explicit labeling by the user — the agent infers from language and behavior.

🟥 Friction Signals (inferred from language)

Signal	Detection Pattern	What It Means
Redirect	"no", "wait", "that's not", "I meant", "go back"	Agent misunderstood; wrong branch taken
Confusion	"what does that mean", "I don't understand", "which one", "huh?"	Skill language or concept unclear
Frustration	"ugh", "why is it", "this is wrong", "come on", "seriously"	Accumulated friction hitting threshold
Resignation	very short replies after long agent output	User disengaging; going through motions
Restart	"let's start over", "forget that", "begin again"	Phase or intake fundamentally failed
Clarification loop	same concept clarified 3+ times in a session	Intake variable or phase description is broken

🟨 Flow Signals (inferred from behavior)

Signal	Detection Pattern	What It Means
Intake re-collection	A required variable answered, then re-answered	Intake question was ambiguous
Phase skip request	"skip that", "just do X", "move on"	Phase is blocking, unclear, or irrelevant
Phase repeat request	"redo that", "can you do that again"	Phase output was wrong or incomplete
Output rejection	"that's not right", "that's not what I asked for"	Phase logic or output format mismatch
Scope expansion	User adds context mid-run that should have been in intake	Intake didn't capture enough upfront

⬛ Termination Signals (inferred from session end)

Signal	Detection Pattern	What It Means
Clean completion	Session ends after final phase	Skill worked
Abandonment	Session ends mid-phase without output	Something failed hard enough to quit
Silent drop	No user response for 5+ minutes then session ends	Disengagement — the output didn't land
Timeout	Platform timeout with skill incomplete	Could be user or could be a hanging phase
Error exit	Agent error or exception ends session	Technical failure in a phase
Premature output	User asks for output before all phases complete	Skill is too long / phases feel unnecessary

🟢 Positive Signals (also captured)

Signal	Detection Pattern	What It Means
Flow state	Long clean runs with no redirects	Skill is working
Delight	"this is great", "exactly", "perfect", "🔥", "❤️"	Phase or output exceeded expectations
Re-use	Same user activates same skill multiple times	Skill has become part of their workflow
Recommendation	User describes sharing or sending the skill	Viral signal — high value
Scope expansion	User adds more after clean completion	Skill built trust and appetite

Layer 2: Signal Analysis

Run this periodically (after 5+ sessions, or on demand) to surface patterns.

Analysis Modes

Mode A: Single Skill Review Triggered when: a skill has 5+ new sessions since last review.

Activate: "Review signals for survey-nlp-analyzer"

Mode B: Cross-Skill Pattern Scan Triggered when: multiple skills share friction signal types.

Activate: "Scan for cross-skill friction patterns"

Mode C: Library Health Check Triggered periodically or on demand.

Activate: "Show skill library health"

Analysis Algorithm

from pathlib import Path
import json
from collections import defaultdict
from dataclasses import dataclass, field

@dataclass
class FrictionPattern:
    skill_name: str
    phase: str
    signal_type: str
    occurrences: int
    session_ids: list[str] = field(default_factory=list)
    example_snippets: list[str] = field(default_factory=list)
    severity: str = "low"  # low | medium | high | critical

def analyze_signals(skill_name: str, signals_path: Path) -> list[FrictionPattern]:
    """Read session signals and surface friction patterns for one skill."""
    records = [
        json.loads(line)
        for line in signals_path.read_text().splitlines()
        if line.strip() and json.loads(line).get("skill_name") == skill_name
    ]

    patterns: dict[tuple, FrictionPattern] = defaultdict()

    for record in records:
        # Abandonment: highest severity
        if record["termination_type"] == "abandoned":
            key = (skill_name, record["last_phase_reached"], "abandonment")
            if key not in patterns:
                patterns[key] = FrictionPattern(
                    skill_name=skill_name,
                    phase=record["last_phase_reached"],
                    signal_type="abandonment",
                    occurrences=0,
                    severity="high",
                )
            patterns[key].occurrences += 1
            patterns[key].session_ids.append(record["session_id"])

        # Friction events
        for event in record.get("friction_events", []):
            key = (skill_name, event["phase"], event["type"])
            if key not in patterns:
                patterns[key] = FrictionPattern(
                    skill_name=skill_name,
                    phase=event["phase"],
                    signal_type=event["type"],
                    occurrences=0,
                )
            patterns[key].occurrences += 1
            patterns[key].example_snippets.append(
                event.get("user_message_snippet", "")
            )

    # Assign severity based on occurrence count
    for pattern in patterns.values():
        if pattern.occurrences >= 5:
            pattern.severity = "critical"
        elif pattern.occurrences >= 3:
            pattern.severity = "high"
        elif pattern.occurrences >= 2:
            pattern.severity = "medium"

    return sorted(patterns.values(), key=lambda p: p.occurrences, reverse=True)

Layer 3: Improvement Proposal

Every pattern surfaces a structured proposal. No auto-edits. Human approves.

Proposal Schema

{
  "proposal_id": "PROP-2026-03-001",
  "skill_name": "survey-nlp-analyzer",
  "current_version": "1.1.0",
  "proposed_version": "1.1.1",
  "triggered_by": [
    "3 sessions abandoned at Phase 3",
    "4 redirect events on intake: CONTEXT_DIMENSIONS question"
  ],
  "friction_evidence": [
    "\"no i meant the department column not the team column\"",
    "\"wait which dimensions are you asking about\"",
    "\"i don't know what context dimensions means\""
  ],
  "proposed_change": {
    "section": "Intake: CONTEXT_DIMENSIONS",
    "type": "clarification",
    "before": "Columns to cut results by (e.g., department, region, role, date)",
    "after": "The columns in your data you want to compare across. Think: what question are you trying to answer? 'How does feedback differ by department?' = department column. 'How does it vary by region?' = region column. You can pick multiple."
  },
  "version_bump_type": "patch",
  "deployment_stage_to_validate": "Skill Owner",
  "status": "pending_review"
}

Proposal Triage Rules

Pattern	Version Bump	Who Validates
Wording confusion in intake (1-2 fields)	`patch` x.x.1	Skill Owner
Phase logic produces wrong output	`patch` x.x.1	Skill Owner
Entire phase causes abandonment	`minor` x.1.x	Peer Teams
New source type not in Corpus Guide	`minor` x.1.x	Peer Teams
Core pipeline assumption is wrong	`major` x+1.x.x	+ skill-universalizer
Cross-skill shared pattern found	New shared utility	All deployment stages

Layer 4: Library Health Dashboard

A snapshot of every skill's status, surfaced on demand.

Activate: "Show skill library health"

Health Dimensions

Dimension	Healthy	Warning	Critical
Last used	< 30 days	30-90 days	> 90 days
Abandonment rate	< 10%	10-25%	> 25%
Redirect rate	< 1/session	1-2/session	> 2/session
Open proposals	0	1-2	3+ unreviewed
Deployment stage	Proven	Refine only	Never validated
Version vs signals	Current	1 version behind	2+ versions behind

Example Output

┌─────────────────────────────────────────────────────────────────────────────┐
│  🟢 survey-nlp-analyzer v1.1.0   12 uses   2 open proposals        │
│     Abandonment: 8%   Redirects: 0.9/session   Stage: Proven    │
├─────────────────────────────────────────────────────────────────────────────┤
│  🟢 org-data-pipeline v1.0.0      3 uses   0 open proposals        │
│     Abandonment: 0%   Redirects: 1.3/session   Stage: Refine    │
├─────────────────────────────────────────────────────────────────────────────┤
│  🟡 talent-card-generator v1.0.0   1 use    1 open proposal         │
│     Abandonment: 0%   Redirects: 2.0/session   Stage: Refine    │
│     ⚠️ High redirect rate — intake may need clarification          │
├─────────────────────────────────────────────────────────────────────────────┤
│  🔴 fplus-tech-panel v1.0.0        8 uses   0 proposals             │
│     Abandonment: 12%  Redirects: 0.4/session   Stage: Refine    │
│     ⚠️ Last used 67 days ago — may be stale                       │
└─────────────────────────────────────────────────────────────────────────────┘

The Silence Problem

The most important design decision in this skill.

Traditional feedback loops assume the user will respond. They won't — especially when frustrated. Here's how each termination type is handled:

CLEAN COMPLETION
  User completes the skill and responds positively
  → Log as success. Record which phases had friction en route.
  → Still capture any redirects or confusion events.

CLEAN COMPLETION + SILENCE
  Skill finishes. Agent asks if output is what they needed. No response.
  → Log as "silent completion". Output may not have landed.
  → Flag the final phase for review. Was the output confusing?
  → Treat as weak signal, not failure — user may just be busy.

MID-PHASE ABANDONMENT
  Session ends while a phase is in progress.
  → Log the exact phase and turn number.
  → This is your highest-value signal. The skill broke here.
  → Cross-reference: did other sessions also abandon at this phase?
  → Even one abandonment at the same phase = review it.

PREMATURE CLOSURE (no response, session drops)
  User stops responding. Eventually times out.
  → Log the last agent message that received no response.
  → That message is the candidate for investigation.
  → Was it too long? Too complex? Wrong format? Asked too much?

FRUSTRATION + COMPLETION
  User finishes despite visible frustration signals.
  → This is worse than abandonment in some ways.
  → They got the output but the experience was bad.
  → They will not come back. They will not recommend it.
  → Treat friction events here as high-severity.

RAGE QUIT
  Strong frustration language immediately followed by session end.
  → Highest severity. Log verbatim language snippets (anonymized).
  → Automatic critical-severity proposal triggered.
  → Do not wait for pattern accumulation. Fix it now.

Cross-Skill Pattern Detection

When the same friction type appears in 3+ skills at the same structural location, it's a systemic problem, not a skill problem.

Common Cross-Skill Patterns to Watch

Pattern	Location	Fix
File path confusion	Intake across all skills	Add a universal file-path intake helper note
Column name ambiguity	Intake across data skills	Add "agent auto-reads headers" to all intakes
Abandonment at output	Phase 8 across report skills	Output format not matching expectation — ask earlier
"What does X mean"	Phase descriptions	Jargon in skill language; simplify
Resignation after long output	Any long agent response	Break long outputs into smaller confirmed steps

File Structure and Signal Privacy

All signal data lives in the PRIVATE repo only. The public repo never sees raw signals — only the result: improved skills with sanitized commit messages.

This is not a suggestion. It is an immutable rule from the doctrine: raw signals never leave the private repo.

Private Repo (signal data, proposals, working state)

<your-private-repo>/skills/skill-improver/
  session_signals.jsonl           # Append-only signal log (all sessions)
  proposals/
    PROP-2026-03-001.json         # One file per proposal
    PROP-2026-03-002.json
  library_health_latest.json      # Last health dashboard snapshot
  maturity_signals/
    survey-nlp-analyzer.json      # Per-skill maturity tracking
    org-data-pipeline.json

Public Repo (universalized skill definition only)

<public-skill-library>/skill-improver/
  SKILL.md                        # This file (the skill definition)
  skill.yaml                      # Metadata and registry entry

What Goes Where

Content	Private Repo	Public Repo
`session_signals.jsonl` (raw signal log)	YES	NEVER
`proposals/*.json` (improvement proposals)	YES	NEVER
`library_health_latest.json` (health snapshot)	YES	NEVER
`maturity_signals/*.json` (per-skill tracking)	YES	NEVER
`SKILL.md` (this skill definition)	Optional copy	YES
`skill.yaml` (metadata)	Optional copy	YES
Improvement results (what changed + why)	N/A	YES (in commit messages)

The public repo receives only the outcome of improvement — a better SKILL.md with a version bump and a sanitized commit message explaining what changed and why. The evidence (signals, session snippets, user language) stays private.

⚠️ Anti-Patterns

❌ Asking users to rate the skill at the end (they won't; worst ones won't)
❌ Auto-applying fixes without human review (proposals only)
❌ Treating one session's frustration as a pattern (wait for 2-3 occurrences)
❌ Ignoring positive signals (they tell you what NOT to change)
❌ Flagging every short user response as disengagement (some people are just terse)
❌ Waiting for users to report problems (observe; don't ask)
❌ Skipping abandonment analysis (it's your most honest signal)
❌ Making the signal log human-readable only (keep it structured JSON for analysis)
❌ Version-bumping for cosmetic changes (reserve versions for structural changes)

Signal-Driven Promotion (Private to Public Pipeline)

skill-improver does not just fix skills — it drives the pipeline from private draft to public universalized skill.

The same signal data that surfaces friction patterns also tells you when a skill is ready. A private skill that has been used multiple times without friction is a candidate for universalization. skill-improver detects this and initiates the promotion workflow.

How Promotion Works

Private skill accumulates usage signals
    │
    ▼
skill-improver evaluates stabilization criteria
    │
    ├── 3+ successful uses (clean completions)?
    ├── No unresolved friction patterns?
    ├── No critical proposals pending?
    │
    ▼ (all criteria met)
skill-improver generates a PROMOTION PROPOSAL
    │
    ▼
Proposal presented to the skill owner:
  "org-data-pipeline has been used 5 times with zero friction.
   It appears stable. Recommend universalization."
    │
    ▼
Skill owner approves
    │
    ▼
skill-universalizer is triggered:
  • Strips context-specific references
  • Creates intake variables for all hardcoded values
  • Generates skill.yaml with origin and attribution
  • Runs PII scan and structure validation
    │
    ▼
PR submitted to public repo as BETA
    │
    ▼
Human maintainer reviews and merges

Stabilization Criteria

Criterion	Threshold	Why
Successful uses	3+ clean completions	Proves the skill works across sessions
Friction patterns	None unresolved	Ensures known issues are already fixed
Critical proposals	None pending	No known structural problems
Last use recency	Within 30 days	Skill is actively being used, not abandoned

What skill-improver Does NOT Do

Does not auto-submit PRs. It proposes. A human approves.
Does not universalize. It triggers
```
skill-universalizer
```
to do that work.
Does not bypass quality gates. The promoted skill still goes through every CI check.
Does not promote skills with unresolved friction. Even if use count is high, friction blocks promotion.

Maturity Feedback Loop

skill-improver is the engine that drives the maturity lifecycle defined in the doctrine. It tracks maturity signals for every skill and generates promotion proposals when criteria are met — or demotion alerts when regression is detected.

Maturity Signal Tracking

For every skill in the library, skill-improver maintains a

maturity_signals

record in the private repo:

{
  "skill_name": "survey-nlp-analyzer",
  "current_maturity": "beta",
  "total_uses": 12,
  "clean_completions": 11,
  "abandonments": 1,
  "abandonment_rate": 0.083,
  "positive_signals": 3,
  "positive_signal_types": ["delight", "re-use", "re-use"],
  "critical_proposals_open": 0,
  "last_used": "2026-03-22T10:15:00Z",
  "promotion_eligible": true,
  "demotion_risk": false
}

Promotion Proposal Generation

When a BETA skill meets all promotion criteria, skill-improver generates a promotion proposal:

Promotion criteria (from doctrine):

5+ successful uses (clean completions)
Abandonment rate < 20%
No unresolved critical improvement proposals
At least 1 positive signal (delight, re-use, or recommendation)
Maintainer review and approval

{
  "proposal_type": "maturity_promotion",
  "skill_name": "survey-nlp-analyzer",
  "current_maturity": "beta",
  "proposed_maturity": "stable",
  "evidence": {
    "clean_completions": 11,
    "abandonment_rate": "8.3%",
    "positive_signals": ["delight x1", "re-use x2"],
    "critical_proposals_open": 0,
    "last_friction_resolved": "2026-03-15"
  },
  "recommendation": "Promote to STABLE. 12 sessions, 8.3% abandonment, 3 positive signals, zero open critical proposals.",
  "status": "pending_maintainer_review"
}

Demotion Alert Generation

When a STABLE skill shows regression, skill-improver generates a demotion alert:

Demotion triggers (from doctrine):

Abandonment rate spikes above 25%
3+ critical improvement proposals go unresolved
Maintainer flags a quality concern

{
  "alert_type": "maturity_demotion",
  "skill_name": "fplus-tech-panel",
  "current_maturity": "stable",
  "proposed_maturity": "beta",
  "evidence": {
    "abandonment_rate": "31%",
    "abandonment_trend": "12% → 18% → 31% over last 3 review periods",
    "critical_proposals_open": 2,
    "last_positive_signal": "2026-02-10"
  },
  "recommendation": "Demote to BETA. Abandonment rate has spiked to 31% with 2 unresolved critical proposals. Investigate Phase 3 abandonment cluster.",
  "status": "pending_maintainer_review"
}

The Full Maturity Loop

                      skill-improver observes
                              │
                ┌─────────────┼─────────────┐
                ▼             ▼             ▼
           BETA skill    STABLE skill   Any skill
           accumulates   shows          has usage
           positive      regression     patterns
           signals                      analyzed
                │             │             │
                ▼             ▼             ▼
           Promotion     Demotion     Improvement
           Proposal      Alert        Proposal
           generated     generated    generated
                │             │             │
                └─────────────┼─────────────┘
                              ▼
                     Human reviews and
                     approves/rejects
                              │
                              ▼
                     Maturity status or
                     skill content updated
                     in public repo

This means the maturity lifecycle is not a manual checklist someone has to remember to run. It is continuously evaluated by the signal system. Every session contributes data. Every analysis cycle checks whether any skill's maturity status should change.

🌐 Platform Notes

Platform	Signal Collection	Analysis	Notes
CLI Agent (e.g., Claude Code)	✅ Native	✅ Native	Full session access; signals written automatically
Web Agent (e.g., Wibey, ChatGPT)	✅ If session logs available	✅	Depends on platform log format and export capability
Async Agent (e.g., Codex, Devin)	⚠️ Manual export	✅	Export or paste session transcript for analysis
Any LLM (manual mode)	⚠️ Manual	✅	Paste transcript; agent infers signals from text

🔁 Deployment Ladder

Stage	Team	What to validate
Refine	Skill Owner	Signal schema correct, proposals actionable, not over-triggering
Prove	Peer Teams	Cross-session patterns surface correctly, health dashboard accurate
Scale	Enterprise / All Teams	Passive layer runs on all skills automatically, proposals route to right owners