Claude-skill-registry decision-graph-analyzer

Query and analyze the AI Counsel decision graph to find past deliberations, identify patterns, and debug memory issues

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/decision-graph-analyzer" ~/.claude/skills/majiayu000-claude-skill-registry-decision-graph-analyzer && rm -rf "$T"
manifest: skills/data/decision-graph-analyzer/SKILL.md
source content

Decision Graph Analyzer Skill

Overview

The decision graph module (

decision_graph/
) stores completed deliberations and provides semantic similarity-based retrieval for context injection. This skill teaches you how to query, analyze, and troubleshoot the decision graph effectively.

Core Components

Storage Layer (
decision_graph/storage.py
)

  • DecisionGraphStorage: SQLite3 backend with CRUD operations
  • Schema:
    decision_nodes
    ,
    participant_stances
    ,
    decision_similarities
  • Indexes: Optimized for timestamp (recency), question (duplicates), similarity (retrieval)
  • Connection: Use
    :memory:
    for testing, file path for production

Integration Layer (
decision_graph/integration.py
)

  • DecisionGraphIntegration: High-level API facade
  • Methods:
    • store_deliberation(question, result)
      : Save completed deliberation
    • get_context_for_deliberation(question)
      : Retrieve similar past decisions
    • get_graph_stats()
      : Get monitoring statistics
    • health_check()
      : Validate database integrity

Retrieval Layer (
decision_graph/retrieval.py
)

  • DecisionRetriever: Finds relevant decisions and formats context
  • Key Features:
    • Two-tier caching (L1: query results, L2: embeddings)
    • Adaptive k (2-5 results based on database size)
    • Noise floor filtering (0.40 minimum similarity)
    • Tiered formatting (strong/moderate/brief)

Maintenance Layer (
decision_graph/maintenance.py
)

  • DecisionGraphMaintenance: Monitoring and health checks
  • Methods:
    • get_database_stats()
      : Node/stance/similarity counts, DB size
    • analyze_growth(days)
      : Growth rate and projections
    • health_check()
      : Validate data integrity
    • estimate_archival_benefit()
      : Space savings simulation

Common Query Patterns

1. Find Similar Decisions

When: You want to see what past deliberations are related to a new question.

from decision_graph.integration import DecisionGraphIntegration
from decision_graph.storage import DecisionGraphStorage

# Initialize
storage = DecisionGraphStorage("decision_graph.db")
integration = DecisionGraphIntegration(storage)

# Get similar decisions with context
question = "Should we adopt TypeScript for the project?"
context = integration.get_context_for_deliberation(question)

if context:
    print("Found relevant past decisions:")
    print(context)
else:
    print("No similar past decisions found")

Direct retrieval access:

from decision_graph.retrieval import DecisionRetriever

retriever = DecisionRetriever(storage)

# Get scored results (DecisionNode, similarity_score) tuples
scored_decisions = retriever.find_relevant_decisions(
    query_question="Should we adopt TypeScript?",
    threshold=0.7,  # Deprecated but kept for compatibility
    max_results=3   # Deprecated - uses adaptive k instead
)

for decision, score in scored_decisions:
    print(f"Score: {score:.2f}")
    print(f"Question: {decision.question}")
    print(f"Consensus: {decision.consensus}")
    print(f"Participants: {', '.join(decision.participants)}")
    print("---")

2. Inspect Database Statistics

When: Monitoring growth, checking health, or debugging performance.

# Get comprehensive stats
stats = integration.get_graph_stats()
print(f"Total decisions: {stats['total_decisions']}")
print(f"Total stances: {stats['total_stances']}")
print(f"Total similarities: {stats['total_similarities']}")
print(f"Database size: {stats['db_size_mb']} MB")

# Analyze growth rate
from decision_graph.maintenance import DecisionGraphMaintenance
maintenance = DecisionGraphMaintenance(storage)

growth = maintenance.analyze_growth(days=30)
print(f"Decisions in last 30 days: {growth['decisions_in_period']}")
print(f"Average per day: {growth['avg_decisions_per_day']}")
print(f"Projected next 30 days: {growth['projected_decisions_30d']}")

3. Validate Database Health

When: Debugging issues, after schema changes, or periodic maintenance.

# Run comprehensive health check
health = integration.health_check()

if health['healthy']:
    print(f"Database is healthy ({health['checks_passed']}/{health['checks_passed']} checks passed)")
else:
    print(f"Found {health['checks_failed']} issues:")
    for issue in health['issues']:
        print(f"  - {issue}")

    # View detailed results
    print("\nDetails:")
    for check, result in health['details'].items():
        print(f"  {check}: {result}")

Common issues detected:

  • Orphaned participant stances (decision_id doesn't exist)
  • Orphaned similarities (source_id or target_id missing)
  • Future timestamps (data corruption)
  • Missing required fields (incomplete data)
  • Invalid similarity scores (not in 0.0-1.0 range)

4. Analyze Cache Performance

When: Debugging slow queries or optimizing cache configuration.

# Get cache statistics
retriever = DecisionRetriever(storage, enable_cache=True)

# Run some queries first to populate cache
for question in test_questions:
    retriever.find_relevant_decisions(question)

# Check cache stats
cache_stats = retriever.get_cache_stats()
print(f"L1 query cache: {cache_stats['query_cache_size']} entries")
print(f"L1 hit rate: {cache_stats['query_hit_rate']:.1%}")
print(f"L2 embedding cache: {cache_stats['embedding_cache_size']} entries")
print(f"L2 hit rate: {cache_stats['embedding_hit_rate']:.1%}")

# Invalidate cache after adding new decisions
retriever.invalidate_cache()

Expected performance:

  • L1 cache hit: <2μs (instant)
  • L1 cache miss: <100ms (compute similarities)
  • L2 cache hit: ~50% after warmup
  • Target: 60%+ L1 hit rate for production workloads

5. Retrieve Specific Decisions

When: Debugging, inspection, or building custom queries.

# Get a specific decision by ID
decision = storage.get_decision_node(decision_id="uuid-here")
if decision:
    print(f"Question: {decision.question}")
    print(f"Timestamp: {decision.timestamp}")
    print(f"Consensus: {decision.consensus}")
    print(f"Status: {decision.convergence_status}")

    # Get participant stances
    stances = storage.get_participant_stances(decision.id)
    for stance in stances:
        print(f"{stance.participant}: {stance.vote_option} ({stance.confidence:.0%})")
        print(f"  Rationale: {stance.rationale}")

# Get all recent decisions
recent_decisions = storage.get_all_decisions(limit=10, offset=0)
for decision in recent_decisions:
    print(f"{decision.timestamp}: {decision.question[:50]}...")

# Find similar decisions to a known decision
similar = storage.get_similar_decisions(
    decision_id="uuid-here",
    threshold=0.7,
    limit=5
)
for decision, score in similar:
    print(f"Score: {score:.2f} - {decision.question}")

6. Manual Similarity Computation

When: Testing similarity detection, calibrating thresholds, or debugging retrieval.

from decision_graph.similarity import QuestionSimilarityDetector

detector = QuestionSimilarityDetector()

# Check backend being used
print(f"Backend: {detector.backend.__class__.__name__}")
# Outputs: SentenceTransformerBackend, TFIDFBackend, or JaccardBackend

# Compute similarity between two questions
score = detector.compute_similarity(
    "Should we use TypeScript?",
    "Should we adopt TypeScript for our project?"
)
print(f"Similarity: {score:.3f}")

# Find similar questions from candidates
candidates = [
    ("id1", "Should we use React or Vue?"),
    ("id2", "What database should we choose?"),
    ("id3", "Should we migrate to TypeScript?")
]

matches = detector.find_similar(
    query="Should we adopt TypeScript?",
    candidates=candidates,
    threshold=0.7
)

for match in matches:
    print(f"{match['id']}: {match['score']:.2f}")

Similarity Score Interpretation

The decision graph uses semantic similarity scores (0.0-1.0) to determine relevance:

Score RangeTierMeaningExample
0.90-1.00DuplicateNear-identical questions"Use TypeScript?" vs "Should we use TypeScript?"
0.75-0.89StrongHighly related topics"Use TypeScript?" vs "Adopt TypeScript for backend?"
0.60-0.74ModerateRelated but distinct"Use TypeScript?" vs "What language for frontend?"
0.40-0.59BriefTangentially related"Use TypeScript?" vs "Choose a static analyzer"
0.00-0.39NoiseUnrelated or spurious"Use TypeScript?" vs "What database to use?"

Thresholds in use:

  • Noise floor (0.40): Minimum similarity to include in results
  • Default threshold (0.70): Legacy retrieval threshold (deprecated)
  • Strong tier (0.75): Full formatting with stances in context
  • Moderate tier (0.60): Summary formatting without stances

Adaptive k (result count):

  • Small DB (<100 decisions): k=5 (exploration phase)
  • Medium DB (100-999): k=3 (balanced phase)
  • Large DB (≥1000): k=2 (precision phase)

Tiered Context Formatting

The decision graph uses budget-aware tiered formatting to control token usage:

Strong Tier (≥0.75 similarity)

Format: Full details with participant stances (~500 tokens)

### Strong Match (similarity: 0.85): Should we use TypeScript?
**Date**: 2024-10-15T14:30:00
**Convergence Status**: converged
**Consensus**: Adopt TypeScript for type safety and tooling benefits
**Winning Option**: Option A: Adopt TypeScript
**Participants**: opus@claude, gpt-4@codex, gemini-pro@gemini

**Participant Positions**:
- **opus@claude**: Voted for 'Option A' (confidence: 90%) - Strong type system reduces bugs
- **gpt-4@codex**: Voted for 'Option A' (confidence: 85%) - Better IDE support
- **gemini-pro@gemini**: Voted for 'Option A' (confidence: 80%) - Easier refactoring

Moderate Tier (0.60-0.74 similarity)

Format: Summary without stances (~200 tokens)

### Moderate Match (similarity: 0.68): What language for frontend?
**Consensus**: Use TypeScript for better type safety
**Result**: TypeScript

Brief Tier (0.40-0.59 similarity)

Format: One-liner (~50 tokens)

- **Brief Match** (0.45): Choose static analysis tools → ESLint with TypeScript

Token budget (default: 2000 tokens):

  • Allows ~2-3 strong decisions, or
  • ~5-7 moderate decisions, or
  • ~20-40 brief decisions
  • Formatting stops when budget reached

Troubleshooting

Issue: No context retrieved for similar questions

Symptoms:

get_context_for_deliberation()
returns empty string

Diagnosis:

# Check if decisions exist
stats = integration.get_graph_stats()
print(f"Total decisions: {stats['total_decisions']}")

# Try direct retrieval with lower threshold
retriever = DecisionRetriever(storage)
scored = retriever.find_relevant_decisions(
    query_question="Your question here",
    threshold=0.0  # See all results
)
print(f"Found {len(scored)} candidates above noise floor (0.40)")
for decision, score in scored[:5]:
    print(f"  {score:.3f}: {decision.question[:50]}...")

Common causes:

  1. Database empty: No past deliberations stored
  2. Below noise floor: All similarities <0.40 (unrelated questions)
  3. Cache stale: Cache not invalidated after adding decisions
  4. Backend mismatch: Using Jaccard (weak) instead of SentenceTransformer (strong)

Fixes:

# 1. Check database
if stats['total_decisions'] == 0:
    print("No decisions in database - add some first")

# 2. Lower threshold temporarily for testing
context = retriever.get_enriched_context(question, threshold=0.5)

# 3. Invalidate cache
retriever.invalidate_cache()

# 4. Check backend
detector = QuestionSimilarityDetector()
print(f"Using backend: {detector.backend.__class__.__name__}")
# If Jaccard: install sentence-transformers for better results

Issue: Slow queries (>1s latency)

Symptoms:

find_relevant_decisions()
takes >1 second

Diagnosis:

import time

# Measure query latency
start = time.time()
scored = retriever.find_relevant_decisions("Test question")
latency_ms = (time.time() - start) * 1000
print(f"Query latency: {latency_ms:.1f}ms")

# Check cache stats
cache_stats = retriever.get_cache_stats()
print(f"L1 hit rate: {cache_stats['query_hit_rate']:.1%}")
print(f"L2 hit rate: {cache_stats['embedding_hit_rate']:.1%}")

# Check database size
stats = integration.get_graph_stats()
print(f"Total decisions: {stats['total_decisions']}")

Common causes:

  1. Cold cache: First query always slow (computes similarities)
  2. Large database: >1000 decisions increases compute time
  3. No cache: Caching disabled in retriever
  4. Slow backend: Jaccard or TF-IDF slower than SentenceTransformer

Performance targets:

  • Cache hit: <2μs
  • Cache miss (<100 decisions): <50ms
  • Cache miss (100-999 decisions): <100ms
  • Cache miss (≥1000 decisions): <200ms

Fixes:

# 1. Warm up cache (run same query twice)
retriever.find_relevant_decisions(question)  # Cold (slow)
retriever.find_relevant_decisions(question)  # Warm (fast)

# 2. Enable caching if disabled
retriever = DecisionRetriever(storage, enable_cache=True)

# 3. Reduce query limit for large databases
all_decisions = storage.get_all_decisions(limit=100)  # Not 10000

# 4. Upgrade to SentenceTransformer backend
# pip install sentence-transformers

Issue: Memory usage growing

Symptoms: Process memory increases over time

Diagnosis:

# Check cache sizes
cache_stats = retriever.get_cache_stats()
print(f"L1 entries: {cache_stats['query_cache_size']} (max: 200)")
print(f"L2 entries: {cache_stats['embedding_cache_size']} (max: 500)")

# Check database size
stats = integration.get_graph_stats()
print(f"Database: {stats['db_size_mb']} MB")

# Estimate memory usage
# L1: ~5KB per entry = ~1MB for 200 entries
# L2: ~1KB per entry = ~500KB for 500 entries
# Total expected: ~1.5MB for cache + DB size

Common causes:

  1. Cache unbounded: Using custom cache without size limits
  2. Database growth: Normal, ~5KB per decision
  3. Embedding cache: SentenceTransformer embeddings (768 floats each)

Fixes:

# 1. Use bounded cache (default)
retriever = DecisionRetriever(storage, enable_cache=True)
# Auto-creates cache with maxsize=200 (L1) and maxsize=500 (L2)

# 2. Monitor database growth
maintenance = DecisionGraphMaintenance(storage)
growth = maintenance.analyze_growth(days=30)
print(f"Growth rate: {growth['avg_decisions_per_day']:.1f} decisions/day")

# 3. Consider archival at 5000+ decisions (Phase 2)
if stats['total_decisions'] > 5000:
    estimate = maintenance.estimate_archival_benefit()
    print(f"Archival would save ~{estimate['estimated_space_savings_mb']} MB")

Issue: Context not helping convergence

Symptoms: Injected context doesn't improve deliberation quality

Diagnosis:

# Check what context was injected
context = integration.get_context_for_deliberation(question)
print(f"Context length: {len(context)} chars (~{len(context)//4} tokens)")
print(context)

# Check tier distribution in logs (look for MEASUREMENT lines)
# Example: tier_distribution=(strong:1, moderate:0, brief:2)

# Verify similarity scores
scored = retriever.find_relevant_decisions(question)
for decision, score in scored:
    print(f"Score {score:.2f}: {decision.question[:40]}...")
    if score < 0.70:
        print(f"  WARNING: Low similarity, may not be helpful")

Common causes:

  1. Low similarity: Scores 0.40-0.60 are tangentially related
  2. Brief tier dominance: Most context in brief format (no stances)
  3. Token budget exhausted: Only including 1-2 decisions
  4. Contradictory context: Past decisions conflict with current question

Calibration approach (Phase 1.5):

  • Log MEASUREMENT lines: question, scored_results, tier_distribution, tokens, db_size
  • Analyze which tiers correlate with improved convergence
  • Adjust tier boundaries in config (default: strong=0.75, moderate=0.60)
  • Tune token budget (default: 2000)

Configuration

Context injection can be configured in

config.yaml
:

decision_graph:
  enabled: true
  db_path: "decision_graph.db"

  # Retrieval settings
  similarity_threshold: 0.7        # DEPRECATED - uses noise floor (0.40) instead
  max_context_decisions: 3         # DEPRECATED - uses adaptive k instead

  # Tiered formatting (NEW)
  tier_boundaries:
    strong: 0.75                   # Full details with stances
    moderate: 0.60                 # Summary without stances
    # brief: implicit (≥0.40 noise floor)

  context_token_budget: 2000       # Max tokens for context injection

Tuning recommendations:

  • Start with defaults (strong=0.75, moderate=0.60, budget=2000)
  • Collect MEASUREMENT logs over 50-100 deliberations
  • Analyze tier distribution vs convergence improvement
  • Adjust boundaries if needed (e.g., raise to 0.80/0.70 for stricter relevance)
  • Increase budget if frequently hitting limit with strong matches

Testing Queries

# Minimal test: Store and retrieve
from decision_graph.integration import DecisionGraphIntegration
from decision_graph.storage import DecisionGraphStorage
from models.schema import DeliberationResult, Summary, ConvergenceInfo

storage = DecisionGraphStorage(":memory:")
integration = DecisionGraphIntegration(storage)

# Create mock result
result = DeliberationResult(
    participants=["opus@claude", "gpt-4@codex"],
    rounds_completed=2,
    summary=Summary(consensus="Test consensus"),
    convergence_info=ConvergenceInfo(status="converged"),
    full_debate=[],
    transcript_path="test.md"
)

# Store
decision_id = integration.store_deliberation("Should we use TypeScript?", result)
print(f"Stored: {decision_id}")

# Retrieve
context = integration.get_context_for_deliberation("Should we adopt TypeScript?")
print(f"Context retrieved: {len(context)} chars")
assert len(context) > 0, "Should find similar decision"

Key Files Reference

  • Storage:
    decision_graph/storage.py
    - SQLite CRUD operations
  • Schema:
    decision_graph/schema.py
    - DecisionNode, ParticipantStance, DecisionSimilarity
  • Retrieval:
    decision_graph/retrieval.py
    - DecisionRetriever with caching
  • Integration:
    decision_graph/integration.py
    - High-level API facade
  • Similarity:
    decision_graph/similarity.py
    - Semantic similarity detection
  • Cache:
    decision_graph/cache.py
    - Two-tier LRU caching
  • Maintenance:
    decision_graph/maintenance.py
    - Stats and health checks
  • Workers:
    decision_graph/workers.py
    - Async background processing

See Also

  • CLAUDE.md: Decision Graph Memory Architecture section
  • Tests:
    tests/unit/test_decision_graph*.py
    - Unit tests with examples
  • Integration tests:
    tests/integration/test_*memory*.py
    - Full workflow tests
  • Performance tests:
    tests/integration/test_performance.py
    - Latency benchmarks