Claude-skill-registry cm
CASS Memory System - procedural memory for AI coding agents. Three-layer cognitive architecture transforms scattered sessions into persistent, cross-agent learnings with confidence decay, anti-pattern learning, and scientific validation.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/cm" ~/.claude/skills/majiayu000-claude-skill-registry-cm && rm -rf "$T"
skills/data/cm/SKILL.mdCM - CASS Memory System
Procedural memory for AI coding agents. Transforms scattered agent sessions into persistent, cross-agent memory so every agent learns from every other agent's experience.
Critical Concepts for AI Agents
The Three-Layer Cognitive Architecture
| Layer | Role | Storage | Tool |
|---|---|---|---|
| Episodic Memory | Raw session transcripts | | |
| Working Memory | Session summaries | Diary entries | |
| Procedural Memory | Distilled action rules | Playbook | |
Flow: Sessions → Diary summaries → Playbook rules
Why This Matters
Without cm, each agent session starts from zero. With cm:
- Rules that helped in past sessions get reinforced
- Anti-patterns that caused failures become explicit warnings
- Cross-agent learning means Agent B benefits from Agent A's mistakes
- Confidence decay naturally retires stale guidance
Quick Reference for AI Agents
Start of Session
# Get relevant rules and history for your task cm context "implementing OAuth authentication" --json # Output includes: # - Relevant playbook rules with scores # - Related diary entries # - Gap analysis (uncovered areas)
During Work
# Find rules about a topic cm similar "error handling" --json # Check if a pattern is validated cm validate "Always use prepared statements for SQL"
End of Session
# Record which rules helped cm outcome success "RULE-123,RULE-456" # Record which rules caused problems cm outcome failure "RULE-789" # Apply recorded outcomes cm outcome-apply
Periodic Maintenance
# Extract new rules from recent sessions cm reflect # Find stale rules needing re-validation cm stale --days 30 # System health cm doctor
The ACE Pipeline
CM uses a four-stage pipeline to extract and curate rules:
Sessions → Generator → Reflector → Validator → Curator → Playbook ↓ ↓ ↓ ↓ Diary Candidates Evidence Final Rules Entries (LLM) (LLM) (NO LLM!)
Stage Details
| Stage | Uses LLM | Purpose |
|---|---|---|
| Generator | Yes | Summarize sessions into diary entries |
| Reflector | Yes | Propose candidate rules from patterns |
| Validator | Yes | Check rules against historical evidence |
| Curator | NO | Deterministic merge into playbook |
CRITICAL: The Curator is intentionally LLM-free to prevent hallucinated provenance. All rule additions must trace to actual session evidence.
Confidence & Decay System
The Scoring Algorithm
Every rule has a confidence score that decays over time:
score = base_confidence × decay_factor × feedback_modifier
Where:
: Initial confidence (0.0-1.0)base_confidence
:decay_factor
(90-day half-life)0.5^(days_since_feedback / 90)
: Accumulated helpful/harmful signalsfeedback_modifier
Decay Visualization
Day 0: ████████████████████ 1.00 Day 45: ██████████████ 0.71 Day 90: ██████████ 0.50 ← Half-life Day 180: █████ 0.25 Day 270: ██ 0.125
Feedback Multipliers
| Feedback | Multiplier | Effect |
|---|---|---|
| Helpful | 1.0x | Standard positive reinforcement |
| Harmful | 4.0x | Aggressive penalty (asymmetric by design) |
Why asymmetric? Bad advice is more damaging than good advice is helpful. A harmful rule should decay 4x faster than a helpful one recovers.
Anti-Pattern Learning
When a rule accumulates too much harmful feedback, it doesn't just disappear—it inverts:
Original: "Always use global state for configuration" ↓ (harmful feedback accumulates) Inverted: "⚠️ ANTI-PATTERN: Avoid global state for configuration"
The inverted anti-pattern becomes a warning that prevents future agents from making the same mistake.
Command Reference
cm context
— Get Task-Relevant Memory
cm contextThe primary command for starting any task.
# Basic context retrieval cm context "implementing user authentication" # JSON output for programmatic use cm context "database migration" --json # Deeper historical context cm context "API refactoring" --depth deep # Include gap analysis cm context "payment processing" --gaps
Output includes:
- Top relevant playbook rules (ranked by score × relevance)
- Related diary entries from past sessions
- Gap analysis (categories with thin coverage)
- Suggested starter rules for uncovered areas
cm top
— Highest-Scoring Rules
cm top# Top 10 rules by confidence score cm top 10 # JSON output cm top 20 --json # Filter by category cm top 10 --category testing
cm similar
— Find Related Rules
cm similar# Semantic search over playbook cm similar "error handling patterns" # With scores cm similar "authentication flow" --scores # JSON output cm similar "database queries" --json
cm playbook
— Manage the Playbook
cm playbook# List all rules cm playbook list # Statistics cm playbook stats # Export for documentation cm playbook export --format md > PLAYBOOK.md cm playbook export --format json > playbook.json # Import rules cm playbook import rules.json
cm why
— Rule Provenance
cm why# Show evidence chain for a rule cm why RULE-123 # Output shows: # - Original session(s) that generated the rule # - Diary entries that led to extraction # - Feedback history # - Confidence trajectory
cm mark
— Provide Feedback
cm mark# Mark as helpful (reinforces rule) cm mark RULE-123 --helpful # Mark as harmful (penalizes rule, may trigger inversion) cm mark RULE-123 --harmful # With context cm mark RULE-123 --helpful --reason "Prevented auth vulnerability" # Undo feedback cm undo RULE-123
cm reflect
— Extract Rules from Sessions
cm reflect# Process recent sessions cm reflect # Specific time range cm reflect --since "7d" cm reflect --since "2024-01-01" # Dry run (show what would be extracted) cm reflect --dry-run # Force re-processing of already-reflected sessions cm reflect --force
cm audit
— Check Sessions Against Rules
cm audit# Audit recent sessions for rule violations cm audit # Specific time range cm audit --since "24h" # JSON output cm audit --json
cm validate
— Test a Proposed Rule
cm validate# Check if a rule has historical support cm validate "Always use transactions for multi-table updates" # Output shows: # - Supporting evidence (sessions where this helped) # - Contradicting evidence (sessions where this hurt) # - Recommendation (add/skip/needs-more-data)
cm outcome
— Record Session Results
cm outcome# Record which rules helped cm outcome success "RULE-123,RULE-456,RULE-789" # Record which rules hurt cm outcome failure "RULE-999" # Apply all pending outcomes cm outcome-apply # Clear pending outcomes without applying cm outcome-clear
cm stale
— Find Stale Rules
cm stale# Find rules without recent feedback cm stale # Custom threshold cm stale --days 60 # JSON output cm stale --json # Include decay projection cm stale --project
cm forget
— Deprecate Rules
cm forget# Soft-delete a rule cm forget RULE-123 # With reason cm forget RULE-123 --reason "No longer relevant after framework change" # Force (skip confirmation) cm forget RULE-123 --force
cm doctor
— System Health
cm doctor# Run diagnostics cm doctor # Auto-fix issues cm doctor --fix # JSON output cm doctor --json
Checks:
- cass installation and accessibility
- Playbook integrity
- Diary consistency
- Configuration validity
- Session index freshness
cm usage
— Usage Statistics
cm usage# Show usage stats cm usage # JSON output cm usage --json
cm stats
— Playbook Health Metrics
cm stats# Show playbook health cm stats # Output includes: # - Total rules # - Average confidence # - Category distribution # - Stale rule count # - Anti-pattern count
Agent-Native Onboarding
CM includes a guided onboarding system that requires zero API calls:
cm onboard
The onboarding wizard:
- Explains the three-layer architecture
- Walks through basic commands
- Seeds initial rules from session history
- Sets up appropriate starter playbook
- Configures privacy preferences
No LLM required — onboarding works offline.
Starter Playbooks
Pre-built playbooks for common tech stacks:
# List available starters cm starters # Initialize with a starter cm init --starter typescript cm init --starter python cm init --starter go cm init --starter rust
Available starters:
- TS/JS patterns, npm, testingtypescript
- Python idioms, pip, pytestpython
- Go conventions, modules, testinggo
- Rust patterns, cargo, clippyrust
- Language-agnostic best practicesgeneral
Gap Analysis
CM tracks which categories have thin coverage:
cm context "some task" --gaps
Gap analysis shows:
- Categories with few rules
- Categories with low-confidence rules
- Suggested areas for rule extraction
This helps agents identify blind spots in the collective memory.
Batch Rule Addition
For bulk importing rules:
# From JSON cm playbook import rules.json # From markdown cm playbook import rules.md # With validation cm playbook import rules.json --validate
JSON format:
[ { "content": "Always validate user input at API boundaries", "category": "security", "confidence": 0.8, "source": "manual" } ]
Data Models
PlaybookBullet
{ "id": "RULE-abc123", "content": "Use parameterized queries for all database access", "category": "security", "confidence": 0.85, "created_at": "2024-01-15T10:30:00Z", "last_feedback": "2024-03-20T14:22:00Z", "helpful_count": 12, "harmful_count": 1, "source_sessions": ["session-xyz", "session-abc"], "is_anti_pattern": false, "maturity": "validated" }
FeedbackEvent
{ "rule_id": "RULE-abc123", "type": "helpful", "reason": "Prevented SQL injection in auth flow", "session_id": "session-current", "timestamp": "2024-03-20T14:22:00Z" }
DiaryEntry
{ "id": "diary-xyz789", "session_id": "session-abc", "summary": "Implemented OAuth2 flow with PKCE", "patterns_observed": ["token-refresh", "secure-storage"], "issues_encountered": ["redirect-uri-mismatch"], "candidate_rules": ["Always use state parameter in OAuth"], "created_at": "2024-03-19T16:45:00Z" }
SessionOutcome
{ "session_id": "session-current", "helpful_rules": ["RULE-123", "RULE-456"], "harmful_rules": ["RULE-789"], "recorded_at": "2024-03-20T17:00:00Z", "applied": false }
Rule Maturity States
Rules progress through maturity stages:
proposed → validated → mature → stale → deprecated ↓ ↓ ↓ ↓ Needs Evidence Proven Needs Soft- evidence confirmed helpful refresh deleted
| State | Meaning | Action |
|---|---|---|
| Newly extracted, unvalidated | Await evidence |
| Has supporting evidence | Monitor feedback |
| Consistently helpful over time | Trust highly |
| No recent feedback (>90 days) | Seek re-validation |
| Marked for removal | Will be purged |
MCP Server
Run cm as an MCP server for direct agent integration:
# Start server cm serve # Custom port cm serve --port 9000 # With logging cm serve --verbose
MCP Tools
| Tool | Description |
|---|---|
| Retrieve task-relevant rules and history |
| Semantic search over playbook |
| Mark rules as helpful/harmful |
| Record session outcome with rule attribution |
| Get playbook health metrics |
| Check proposed rule against evidence |
MCP Resources
| Resource | Description |
|---|---|
| Full playbook as JSON |
| Top N rules by score |
| Rules needing re-validation |
| Recent diary entries |
| Playbook health metrics |
Configuration
Directory Structure
~/.config/cm/ ├── config.toml # Main configuration ├── playbook.json # Rule storage └── diary/ # Session summaries └── *.json .cm/ # Project-local config ├── config.toml # Project overrides └── playbook.json # Project-specific rules
Config File Reference
# ~/.config/cm/config.toml [general] # LLM model for Generator/Reflector/Validator model = "claude-sonnet-4-20250514" # Auto-apply outcomes after session auto_apply_outcomes = false # Check for updates check_updates = true [decay] # Half-life in days half_life_days = 90 # Harmful feedback multiplier harmful_multiplier = 4.0 # Minimum score before deprecation min_score = 0.1 [reflection] # Minimum sessions before reflecting min_sessions = 3 # Auto-reflect on session end auto_reflect = false [privacy] # Enable cross-agent learning cross_agent_enrichment = true # Anonymize session data in rules anonymize_sources = false [mcp] # MCP server port port = 8080 # Enable MCP server enabled = false
Environment Variables
| Variable | Description | Default |
|---|---|---|
| Configuration directory | |
| Data storage directory | |
| LLM model for ACE pipeline | |
| Decay half-life in days | |
| MCP server port | |
Privacy Controls
# View current privacy settings cm privacy # Disable cross-agent learning cm privacy --disable-enrichment # Enable cross-agent learning cm privacy --enable-enrichment # Anonymize sources in exported playbooks cm privacy --anonymize-export
What Cross-Agent Enrichment Means
When enabled:
- Rules extracted from Agent A's sessions can help Agent B
- Diary entries reference sessions across agents
- Collective learning improves all agents
When disabled:
- Each agent's playbook is isolated
- No session data shared between identities
- Rules only come from your own sessions
Project Integration
Export playbook for project documentation:
# Generate project patterns doc cm project --output docs/PATTERNS.md # Include confidence scores cm project --output docs/PATTERNS.md --scores # Only mature rules cm project --output docs/PATTERNS.md --mature-only
This creates a human-readable document of learned patterns for team reference.
Graceful Degradation
CM degrades gracefully when components are unavailable:
| Scenario | Behavior |
|---|---|
| No cass index | Works from diary only |
| No LLM access | Curator still works, reflection paused |
| Stale sessions | Uses cached diary entries |
| Empty playbook | Returns starter suggestions |
Performance Characteristics
| Operation | Typical Time | Notes |
|---|---|---|
| 50-200ms | Depends on playbook size |
| 100-300ms | Semantic search overhead |
| 2-10s | LLM calls for Generator/Reflector |
| 1-3s | LLM call for Validator |
| <50ms | Pure database operation |
Integration with CASS
CM builds on top of
cass (Coding Agent Session Search):
# cass provides raw session search cass search "authentication" --robot # cm transforms that into procedural memory cm context "authentication"
The typical workflow:
- Work happens in agent sessions (stored by various tools)
indexes and searches those sessionscass
extracts patterns into diary/playbookcm reflect
retrieves relevant knowledge for new taskscm context
Exit Codes
| Code | Meaning |
|---|---|
| Success |
| General error |
| Configuration error |
| Validation failed |
| LLM error (reflection/validation) |
| No data (empty results) |
Troubleshooting
Common Issues
| Problem | Solution |
|---|---|
| "No sessions found" | Run to rebuild session index |
| "Reflection failed" | Check LLM API key and model availability |
| "Stale playbook" | Run to process recent sessions |
| "Low confidence everywhere" | Natural decay; use to reinforce |
Debug Mode
# Verbose output cm context "task" --verbose # Show scoring details cm top 10 --debug # Trace decay calculations cm stats --trace-decay
Ready-to-Paste AGENTS.md Blurb
## cm - CASS Memory System Procedural memory for AI coding agents. Transforms scattered sessions into persistent cross-agent learnings with confidence decay and anti-pattern detection. ### Quick Start cm context "your task" --json # Get relevant rules cm mark RULE-ID --helpful # Reinforce good rules cm outcome success "RULE-1,RULE-2" # Record session results cm reflect # Extract new rules ### Three Layers - Episodic: Raw sessions (via cass) - Working: Diary summaries - Procedural: Playbook rules (this tool) ### Key Features - 90-day confidence half-life (stale rules decay) - 4x penalty for harmful rules (asymmetric by design) - Anti-pattern auto-inversion (bad rules become warnings) - Cross-agent learning (everyone benefits) - LLM-free Curator (no hallucinated provenance) ### Essential Commands cm context "task" --json # Start of session cm similar "pattern" # Find related rules cm mark ID --helpful/harmful # Give feedback cm outcome success "IDs" # End of session cm reflect # Periodic maintenance Exit codes: 0=success, 1=error, 2=config, 3=validation, 4=LLM, 5=no-data
Workflow Example: Complete Session
# 1. Start task - get relevant context cm context "implementing rate limiting for API" --json # → Returns rules about rate limiting, caching, API design # 2. Note which rules you're applying # (mental note: RULE-123 about token buckets, RULE-456 about Redis) # 3. During work, if you find a useful pattern cm validate "Use sliding window for rate limit precision" # → Shows if this has historical support # 4. End of session - record what helped cm outcome success "RULE-123,RULE-456" # 5. If something hurt (caused bugs/issues) cm outcome failure "RULE-789" # 6. Apply the feedback cm outcome-apply # 7. Periodically, extract new rules cm reflect --since "7d"
Philosophy: Why Procedural Memory?
Episodic memory (raw sessions) is too noisy for real-time use. Working memory (summaries) lacks actionability. Procedural memory distills both into executable rules that directly guide behavior.
The key insight: Rules should be testable hypotheses, not static commandments. The confidence decay system treats every rule as a hypothesis that requires ongoing validation. Rules that stop being useful naturally fade; rules that keep helping get reinforced.
This mirrors how human expertise works: you don't remember every project you've done, but you retain the patterns that proved useful across many projects.