Claude-skill-registry cli-interactive-testing

Test and validate DyGram machines using CLI interactive mode. Step through execution, provide intelligent responses, debug behavior, and create test recordings.

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/cli-interactive-testing" ~/.claude/skills/majiayu000-claude-skill-registry-cli-interactive-testing && rm -rf "$T"
manifest: skills/data/cli-interactive-testing/SKILL.md
source content

CLI Interactive Testing Skill

Execute and validate DyGram machines using CLI interactive mode for intelligent turn-by-turn testing.

Purpose

This skill guides you through using the CLI interactive mode to:

  • Test machines by executing them step-by-step
  • Debug behavior by observing state at each turn
  • Provide intelligent responses when LLM decisions are needed
  • Create test recordings for automated CI/CD playback
  • Validate multiple scenarios (success, error, edge cases)

Quick Start

Basic Testing Workflow

# 1. Start interactive execution
dygram execute --interactive machine.dy --id test-01

# 2. Continue execution turn-by-turn
dygram execute --interactive machine.dy --id test-01

# 3. Check status at any time
dygram exec status test-01

# 4. Provide response when needed
echo '{"response": "Continue", "tools": [...]}' | \
  dygram execute --interactive machine.dy --id test-01

Core Concepts

Turn-by-Turn Execution

Each CLI call executes one turn (one LLM invocation):

  • State persists to disk (
    .dygram/executions/<id>/
    )
  • Machine snapshot prevents definition changes mid-execution
  • History logs all turns (
    history.jsonl
    )
  • Auto-resumes from last state

Response Modes

1. Auto-continue (no stdin):

dygram e -i machine.dy --id test

Used for: Task nodes without LLM, simple transitions

2. Manual response (stdin):

echo '{"response": "...", "tools": [...]}' | dygram e -i machine.dy --id test

Used for: Agent nodes, complex decisions, testing specific paths

3. Playback mode (recordings):

dygram e -i machine.dy --playback recordings/golden/ --id test

Used for: Deterministic testing, CI/CD validation

Detailed Workflow

Step 1: Understand the Machine

Before testing, read and understand the machine:

# Read machine definition
cat machines/payment-workflow.dy

# Generate visualization
dygram generate machines/payment-workflow.dy --format html

# Validate syntax
dygram parseAndValidate machines/payment-workflow.dy

Step 2: Start Interactive Execution

Choose execution mode based on goal:

For debugging/exploration:

dygram e -i machines/payment-workflow.dy --id debug

For creating test recordings:

dygram e -i machines/payment-workflow.dy \
  --record recordings/payment-workflow/ \
  --id recording-001

For validating with existing recordings:

dygram e -i machines/payment-workflow.dy \
  --playback recordings/payment-workflow/ \
  --id playback-001

Step 3: Execute Turn-by-Turn

Continue execution, observing and providing input as needed:

# Execute next turn
dygram e -i machines/payment-workflow.dy --id debug

# Check what happened
dygram exec status debug

# View execution history
cat .dygram/executions/debug/history.jsonl | tail -5

# Check current state
cat .dygram/executions/debug/state.json | jq '.executionState.currentNode'

Step 4: Provide Intelligent Responses

When machine needs LLM decision, analyze and provide response:

# First, understand what's needed
cat .dygram/executions/debug/state.json | jq '.executionState.turnState'

# Provide thoughtful response
echo '{
  "response": "Validating payment credentials",
  "tools": [
    {"name": "validate_payment", "params": {"amount": 100}}
  ]
}' | dygram e -i machines/payment-workflow.dy --id debug

Step 5: Continue Until Complete

# Option 1: Manual stepping
dygram e -i machines/payment-workflow.dy --id debug
dygram e -i machines/payment-workflow.dy --id debug
# ... until complete

# Option 2: Loop (with manual responses when needed)
while dygram e -i machines/payment-workflow.dy --id debug 2>&1 | \
  grep -q "Turn completed"; do
  echo "Turn completed, continuing..."
done

Step 6: Validate Results

# Check final status
dygram exec status debug

# Review full history
cat .dygram/executions/debug/history.jsonl

# Check final state
cat .dygram/executions/debug/state.json | jq '.status'

# If recording mode, verify recordings
ls -la recordings/payment-workflow/

Providing Intelligent Responses

Response Format

{
  "response": "Your reasoning and explanation",
  "tools": [
    {
      "name": "tool_name",
      "params": {
        "param1": "value1",
        "param2": "value2"
      }
    }
  ]
}

Decision-Making Process

  1. Analyze Context

    • What node are we at?
    • What tools are available?
    • What is the task prompt asking for?
  2. Understand Intent

    • What is the machine trying to accomplish?
    • What would a real agent do here?
    • Are there multiple valid paths?
  3. Choose Semantically

    • Don't just pattern-match keywords
    • Consider the machine's goal
    • Test different scenarios (success/error/edge)
  4. Document Reasoning

    • Include clear explanation in response
    • This helps understand recordings later

Example Responses

Simple continuation:

echo '{"action": "continue"}' | dygram e -i machine.dy --id test

File operation:

echo '{
  "response": "Reading configuration file to determine environment",
  "tools": [
    {"name": "read_file", "params": {"path": "config.json"}}
  ]
}' | dygram e -i machine.dy --id test

Transition decision:

echo '{
  "response": "Payment validation succeeded, transitioning to confirmation state",
  "tools": [
    {"name": "transition_to_confirmation", "params": {}}
  ]
}' | dygram e -i machine.dy --id test

Multiple tools:

cat <<'EOF' | dygram e -i machine.dy --id test
{
  "response": "Analyzing data and generating report",
  "tools": [
    {"name": "read_file", "params": {"path": "data.json"}},
    {"name": "analyze_data", "params": {"format": "summary"}},
    {"name": "write_file", "params": {
      "path": "report.txt",
      "content": "Analysis complete"
    }}
  ]
}
EOF

Testing Patterns

Pattern 1: Debug Single Execution

Step through to understand behavior:

# Start
dygram e -i machine.dy --id debug --verbose

# Step through with observation
for i in {1..10}; do
  echo "=== Turn $i ==="
  dygram e -i machine.dy --id debug

  # Check state
  dygram exec status debug

  # Review last history entry
  tail -1 .dygram/executions/debug/history.jsonl | jq '.'

  # Pause for review
  read -p "Continue? (y/n) " -n 1 -r
  echo
  [[ ! $REPLY =~ ^[Yy]$ ]] && break
done

Pattern 2: Create Golden Recording

# Start with recording
dygram e -i machine.dy \
  --record recordings/golden-test/ \
  --id golden

# Execute with intelligent responses
# (provide responses as machine requires them)

# Continue until complete
while dygram e -i machine.dy --id golden; do
  echo "Turn completed"
done

# Verify recording
ls -la recordings/golden-test/
dygram e -i machine.dy \
  --playback recordings/golden-test/ \
  --id verify

# Commit to git
git add recordings/golden-test/
git commit -m "Add golden recording for machine"

Pattern 3: Test Multiple Scenarios

# Success path
dygram e -i machine.dy --record recordings/success/ --id success
# ... provide success responses ...

# Error path
dygram e -i machine.dy --record recordings/error/ --id error
# ... provide error responses ...

# Edge case
dygram e -i machine.dy --record recordings/edge/ --id edge
# ... provide edge case responses ...

# Validate all scenarios
for scenario in success error edge; do
  echo "Testing $scenario..."
  dygram e -i machine.dy \
    --playback "recordings/$scenario/" \
    --id "test-$scenario"
done

Pattern 4: Batch Test Multiple Machines

#!/bin/bash
for machine in machines/*.dy; do
  name=$(basename "$machine" .dy)
  echo "Testing: $name"

  # Start with recording
  dygram e -i "$machine" \
    --record "recordings/$name/" \
    --id "$name" \
    --verbose 2>&1 | tee "logs/$name.log"

  # Continue until complete or error
  attempts=0
  max_attempts=20
  while [ $attempts -lt $max_attempts ]; do
    if dygram e -i "$machine" --id "$name"; then
      ((attempts++))
    else
      echo "Completed or errored after $attempts turns"
      break
    fi
  done

  # Check result
  if dygram exec status "$name" | grep -q "complete"; then
    echo "✓ $name: SUCCESS"
  else
    echo "✗ $name: FAILED or INCOMPLETE"
  fi

  # Clean up
  dygram exec rm "$name"
done

Pattern 5: Compare Before/After

Test behavior changes:

# Record baseline
git checkout main
dygram e -i machine.dy --record recordings/baseline/ --id baseline
# ... execute ...

# Record with changes
git checkout feature-branch
dygram e -i machine.dy --record recordings/feature/ --id feature
# ... execute ...

# Compare recordings
diff -u recordings/baseline/ recordings/feature/

# Validate both still work
dygram e -i machine.dy --playback recordings/baseline/ --id test-baseline
dygram e -i machine.dy --playback recordings/feature/ --id test-feature

Recording Management

Creating Recordings

Recordings capture LLM responses for deterministic replay:

dygram e -i machine.dy --record recordings/test-case/ --id test

Recording structure:

recordings/test-case/
  ├── turn-1.json    # First LLM invocation
  ├── turn-2.json    # Second LLM invocation
  └── turn-3.json    # Third LLM invocation

Recording content:

{
  "request": {
    "systemPrompt": "...",
    "tools": [...]
  },
  "response": {
    "content": [...],
    "stop_reason": "tool_use"
  }
}

Using Recordings

# Playback deterministically
dygram e -i machine.dy --playback recordings/test-case/ --id playback

# Continue playback
while dygram e -i machine.dy --id playback; do :; done

Organizing Recordings

Recommended structure:

recordings/
  ├── golden/                    # Golden path tests
  │   ├── basic-workflow/
  │   ├── payment-flow/
  │   └── approval-process/
  ├── edge-cases/               # Edge case scenarios
  │   ├── empty-input/
  │   ├── max-length/
  │   └── special-chars/
  ├── error-handling/           # Error scenarios
  │   ├── missing-file/
  │   ├── invalid-data/
  │   └── timeout/
  └── regression/               # Regression tests
      ├── bug-123-fix/
      ├── bug-456-fix/
      └── feature-789/

Maintaining Recordings

# Update recording when behavior intentionally changes
dygram e -i machine.dy \
  --record recordings/golden/workflow/ \
  --id update \
  --force  # Force new recording

# Validate all recordings still work
for dir in recordings/golden/*/; do
  name=$(basename "$dir")
  echo "Testing: $name"
  dygram e -i "machines/$name.dy" \
    --playback "$dir" \
    --id "validate-$name"
done

State Management

Execution State Files

State is stored in

.dygram/executions/<id>/
:

.dygram/executions/test-01/
  ├── state.json       # Current execution state
  ├── metadata.json    # Execution metadata
  ├── machine.json     # Machine snapshot (prevents mid-execution changes)
  └── history.jsonl    # Turn-by-turn history log

Inspecting State

# View current node
cat .dygram/executions/test-01/state.json | jq '.executionState.currentNode'

# View turn state (if in turn)
cat .dygram/executions/test-01/state.json | jq '.executionState.turnState'

# View visited nodes
cat .dygram/executions/test-01/state.json | jq '.executionState.visitedNodes'

# View attributes
cat .dygram/executions/test-01/state.json | jq '.executionState.attributes'

# View metadata
cat .dygram/executions/test-01/metadata.json | jq '.'

Managing Executions

# List all executions
dygram exec list

# Show specific execution status
dygram exec status test-01

# Remove execution
dygram exec rm test-01

# Clean completed executions
dygram exec clean

Troubleshooting

Execution Not Progressing

Check if waiting for input:

dygram exec status <id>
cat .dygram/executions/<id>/state.json | jq '.executionState.turnState'

Provide required response:

echo '{"response": "...", "tools": [...]}' | dygram e -i machine.dy --id <id>

Wrong Path Taken

Restart from beginning:

dygram exec rm <id>
dygram e -i machine.dy --id <id> --force

Or start new execution:

dygram e -i machine.dy --id <id>-retry

Recording Playback Mismatch

Check recording content:

ls -la recordings/test-case/
cat recordings/test-case/turn-1.json | jq '.'

Verify machine hasn't changed:

# Compare machine hashes
cat .dygram/executions/<id>/metadata.json | jq '.dyash'

Re-record if machine changed:

dygram e -i machine.dy --record recordings/test-case/ --id new --force

State Corruption

View error details:

cat .dygram/executions/<id>/state.json | jq '.status'

Force fresh start:

dygram exec rm <id>
dygram e -i machine.dy --id <id> --force

Best Practices

1. Always Use Explicit IDs

# Good: Explicit ID for tracking
dygram e -i machine.dy --id test-payment-success

# Avoid: Auto-generated IDs are hard to track
dygram e -i machine.dy

2. Create Recordings for Important Tests

# Record golden path
dygram e -i machine.dy --record recordings/golden/ --id golden

# Commit to git
git add recordings/golden/
git commit -m "Add golden recording for regression testing"

3. Use Verbose Mode for Debugging

dygram e -i machine.dy --id debug --verbose

4. Check State Frequently

# After each significant turn
dygram e -i machine.dy --id test
dygram exec status test

5. Clean Up Test Executions

# After testing
dygram exec rm test-01
dygram exec clean

6. Document Test Scenarios

# Create a test plan
cat > TEST_PLAN.md <<'EOF'
# Payment Workflow Tests

## Scenarios
1. Success path: recordings/payment-success/
2. Invalid card: recordings/payment-invalid/
3. Timeout: recordings/payment-timeout/
4. Retry success: recordings/payment-retry/

## Run Tests
for scenario in success invalid timeout retry; do
  dygram e -i payment.dy \
    --playback recordings/payment-$scenario/ \
    --id test-$scenario
done
EOF

Integration with CI/CD

Local Development

# 1. Develop machine
vim machines/workflow.dy

# 2. Test interactively
dygram e -i machines/workflow.dy \
  --record recordings/workflow/ \
  --id workflow-test

# 3. Commit machine and recordings
git add machines/workflow.dy recordings/workflow/
git commit -m "Add workflow machine with tests"

CI Configuration

# .github/workflows/test.yml
name: Test DyGram Machines

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Install DyGram
        run: npm install -g dygram

      - name: Test All Machines
        run: |
          for recording in recordings/golden/*/; do
            machine=$(basename "$recording")
            echo "Testing: $machine"

            dygram execute --interactive \
              "machines/$machine.dy" \
              --playback "$recording" \
              --id "ci-$machine"

            # Check result
            if ! dygram exec status "ci-$machine" | grep -q "complete"; then
              echo "FAILED: $machine"
              exit 1
            fi

            echo "PASSED: $machine"
          done

Summary Checklist

When testing a machine, ensure you:

  • Read and understand the machine definition
  • Start with explicit execution ID
  • Use
    --record
    if creating test recordings
  • Step through execution observing state
  • Provide intelligent responses when needed
  • Check status frequently with
    dygram exec status
  • Validate final state and results
  • Verify recordings if created
  • Clean up test executions when done
  • Commit recordings for CI/CD if appropriate

See Also

  • CLI Interactive Mode Guide:
    docs/cli/interactive-mode.md
  • CLI Reference:
    docs/cli/README.md
  • Agent:
    dygram-test-responder
    (auto-loaded)
  • Examples:
    examples/
    directory

Remember: You have intelligent reasoning - use it! Understand context, make semantic decisions, and test edge cases. Don't just pattern-match; think about what the machine is trying to accomplish.