Claude-skill-registry cli-interactive-testing
Test and validate DyGram machines using CLI interactive mode. Step through execution, provide intelligent responses, debug behavior, and create test recordings.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/cli-interactive-testing" ~/.claude/skills/majiayu000-claude-skill-registry-cli-interactive-testing && rm -rf "$T"
skills/data/cli-interactive-testing/SKILL.mdCLI Interactive Testing Skill
Execute and validate DyGram machines using CLI interactive mode for intelligent turn-by-turn testing.
Purpose
This skill guides you through using the CLI interactive mode to:
- Test machines by executing them step-by-step
- Debug behavior by observing state at each turn
- Provide intelligent responses when LLM decisions are needed
- Create test recordings for automated CI/CD playback
- Validate multiple scenarios (success, error, edge cases)
Quick Start
Basic Testing Workflow
# 1. Start interactive execution dygram execute --interactive machine.dy --id test-01 # 2. Continue execution turn-by-turn dygram execute --interactive machine.dy --id test-01 # 3. Check status at any time dygram exec status test-01 # 4. Provide response when needed echo '{"response": "Continue", "tools": [...]}' | \ dygram execute --interactive machine.dy --id test-01
Core Concepts
Turn-by-Turn Execution
Each CLI call executes one turn (one LLM invocation):
- State persists to disk (
).dygram/executions/<id>/ - Machine snapshot prevents definition changes mid-execution
- History logs all turns (
)history.jsonl - Auto-resumes from last state
Response Modes
1. Auto-continue (no stdin):
dygram e -i machine.dy --id test
Used for: Task nodes without LLM, simple transitions
2. Manual response (stdin):
echo '{"response": "...", "tools": [...]}' | dygram e -i machine.dy --id test
Used for: Agent nodes, complex decisions, testing specific paths
3. Playback mode (recordings):
dygram e -i machine.dy --playback recordings/golden/ --id test
Used for: Deterministic testing, CI/CD validation
Detailed Workflow
Step 1: Understand the Machine
Before testing, read and understand the machine:
# Read machine definition cat machines/payment-workflow.dy # Generate visualization dygram generate machines/payment-workflow.dy --format html # Validate syntax dygram parseAndValidate machines/payment-workflow.dy
Step 2: Start Interactive Execution
Choose execution mode based on goal:
For debugging/exploration:
dygram e -i machines/payment-workflow.dy --id debug
For creating test recordings:
dygram e -i machines/payment-workflow.dy \ --record recordings/payment-workflow/ \ --id recording-001
For validating with existing recordings:
dygram e -i machines/payment-workflow.dy \ --playback recordings/payment-workflow/ \ --id playback-001
Step 3: Execute Turn-by-Turn
Continue execution, observing and providing input as needed:
# Execute next turn dygram e -i machines/payment-workflow.dy --id debug # Check what happened dygram exec status debug # View execution history cat .dygram/executions/debug/history.jsonl | tail -5 # Check current state cat .dygram/executions/debug/state.json | jq '.executionState.currentNode'
Step 4: Provide Intelligent Responses
When machine needs LLM decision, analyze and provide response:
# First, understand what's needed cat .dygram/executions/debug/state.json | jq '.executionState.turnState' # Provide thoughtful response echo '{ "response": "Validating payment credentials", "tools": [ {"name": "validate_payment", "params": {"amount": 100}} ] }' | dygram e -i machines/payment-workflow.dy --id debug
Step 5: Continue Until Complete
# Option 1: Manual stepping dygram e -i machines/payment-workflow.dy --id debug dygram e -i machines/payment-workflow.dy --id debug # ... until complete # Option 2: Loop (with manual responses when needed) while dygram e -i machines/payment-workflow.dy --id debug 2>&1 | \ grep -q "Turn completed"; do echo "Turn completed, continuing..." done
Step 6: Validate Results
# Check final status dygram exec status debug # Review full history cat .dygram/executions/debug/history.jsonl # Check final state cat .dygram/executions/debug/state.json | jq '.status' # If recording mode, verify recordings ls -la recordings/payment-workflow/
Providing Intelligent Responses
Response Format
{ "response": "Your reasoning and explanation", "tools": [ { "name": "tool_name", "params": { "param1": "value1", "param2": "value2" } } ] }
Decision-Making Process
-
Analyze Context
- What node are we at?
- What tools are available?
- What is the task prompt asking for?
-
Understand Intent
- What is the machine trying to accomplish?
- What would a real agent do here?
- Are there multiple valid paths?
-
Choose Semantically
- Don't just pattern-match keywords
- Consider the machine's goal
- Test different scenarios (success/error/edge)
-
Document Reasoning
- Include clear explanation in response
- This helps understand recordings later
Example Responses
Simple continuation:
echo '{"action": "continue"}' | dygram e -i machine.dy --id test
File operation:
echo '{ "response": "Reading configuration file to determine environment", "tools": [ {"name": "read_file", "params": {"path": "config.json"}} ] }' | dygram e -i machine.dy --id test
Transition decision:
echo '{ "response": "Payment validation succeeded, transitioning to confirmation state", "tools": [ {"name": "transition_to_confirmation", "params": {}} ] }' | dygram e -i machine.dy --id test
Multiple tools:
cat <<'EOF' | dygram e -i machine.dy --id test { "response": "Analyzing data and generating report", "tools": [ {"name": "read_file", "params": {"path": "data.json"}}, {"name": "analyze_data", "params": {"format": "summary"}}, {"name": "write_file", "params": { "path": "report.txt", "content": "Analysis complete" }} ] } EOF
Testing Patterns
Pattern 1: Debug Single Execution
Step through to understand behavior:
# Start dygram e -i machine.dy --id debug --verbose # Step through with observation for i in {1..10}; do echo "=== Turn $i ===" dygram e -i machine.dy --id debug # Check state dygram exec status debug # Review last history entry tail -1 .dygram/executions/debug/history.jsonl | jq '.' # Pause for review read -p "Continue? (y/n) " -n 1 -r echo [[ ! $REPLY =~ ^[Yy]$ ]] && break done
Pattern 2: Create Golden Recording
# Start with recording dygram e -i machine.dy \ --record recordings/golden-test/ \ --id golden # Execute with intelligent responses # (provide responses as machine requires them) # Continue until complete while dygram e -i machine.dy --id golden; do echo "Turn completed" done # Verify recording ls -la recordings/golden-test/ dygram e -i machine.dy \ --playback recordings/golden-test/ \ --id verify # Commit to git git add recordings/golden-test/ git commit -m "Add golden recording for machine"
Pattern 3: Test Multiple Scenarios
# Success path dygram e -i machine.dy --record recordings/success/ --id success # ... provide success responses ... # Error path dygram e -i machine.dy --record recordings/error/ --id error # ... provide error responses ... # Edge case dygram e -i machine.dy --record recordings/edge/ --id edge # ... provide edge case responses ... # Validate all scenarios for scenario in success error edge; do echo "Testing $scenario..." dygram e -i machine.dy \ --playback "recordings/$scenario/" \ --id "test-$scenario" done
Pattern 4: Batch Test Multiple Machines
#!/bin/bash for machine in machines/*.dy; do name=$(basename "$machine" .dy) echo "Testing: $name" # Start with recording dygram e -i "$machine" \ --record "recordings/$name/" \ --id "$name" \ --verbose 2>&1 | tee "logs/$name.log" # Continue until complete or error attempts=0 max_attempts=20 while [ $attempts -lt $max_attempts ]; do if dygram e -i "$machine" --id "$name"; then ((attempts++)) else echo "Completed or errored after $attempts turns" break fi done # Check result if dygram exec status "$name" | grep -q "complete"; then echo "✓ $name: SUCCESS" else echo "✗ $name: FAILED or INCOMPLETE" fi # Clean up dygram exec rm "$name" done
Pattern 5: Compare Before/After
Test behavior changes:
# Record baseline git checkout main dygram e -i machine.dy --record recordings/baseline/ --id baseline # ... execute ... # Record with changes git checkout feature-branch dygram e -i machine.dy --record recordings/feature/ --id feature # ... execute ... # Compare recordings diff -u recordings/baseline/ recordings/feature/ # Validate both still work dygram e -i machine.dy --playback recordings/baseline/ --id test-baseline dygram e -i machine.dy --playback recordings/feature/ --id test-feature
Recording Management
Creating Recordings
Recordings capture LLM responses for deterministic replay:
dygram e -i machine.dy --record recordings/test-case/ --id test
Recording structure:
recordings/test-case/ ├── turn-1.json # First LLM invocation ├── turn-2.json # Second LLM invocation └── turn-3.json # Third LLM invocation
Recording content:
{ "request": { "systemPrompt": "...", "tools": [...] }, "response": { "content": [...], "stop_reason": "tool_use" } }
Using Recordings
# Playback deterministically dygram e -i machine.dy --playback recordings/test-case/ --id playback # Continue playback while dygram e -i machine.dy --id playback; do :; done
Organizing Recordings
Recommended structure:
recordings/ ├── golden/ # Golden path tests │ ├── basic-workflow/ │ ├── payment-flow/ │ └── approval-process/ ├── edge-cases/ # Edge case scenarios │ ├── empty-input/ │ ├── max-length/ │ └── special-chars/ ├── error-handling/ # Error scenarios │ ├── missing-file/ │ ├── invalid-data/ │ └── timeout/ └── regression/ # Regression tests ├── bug-123-fix/ ├── bug-456-fix/ └── feature-789/
Maintaining Recordings
# Update recording when behavior intentionally changes dygram e -i machine.dy \ --record recordings/golden/workflow/ \ --id update \ --force # Force new recording # Validate all recordings still work for dir in recordings/golden/*/; do name=$(basename "$dir") echo "Testing: $name" dygram e -i "machines/$name.dy" \ --playback "$dir" \ --id "validate-$name" done
State Management
Execution State Files
State is stored in
.dygram/executions/<id>/:
.dygram/executions/test-01/ ├── state.json # Current execution state ├── metadata.json # Execution metadata ├── machine.json # Machine snapshot (prevents mid-execution changes) └── history.jsonl # Turn-by-turn history log
Inspecting State
# View current node cat .dygram/executions/test-01/state.json | jq '.executionState.currentNode' # View turn state (if in turn) cat .dygram/executions/test-01/state.json | jq '.executionState.turnState' # View visited nodes cat .dygram/executions/test-01/state.json | jq '.executionState.visitedNodes' # View attributes cat .dygram/executions/test-01/state.json | jq '.executionState.attributes' # View metadata cat .dygram/executions/test-01/metadata.json | jq '.'
Managing Executions
# List all executions dygram exec list # Show specific execution status dygram exec status test-01 # Remove execution dygram exec rm test-01 # Clean completed executions dygram exec clean
Troubleshooting
Execution Not Progressing
Check if waiting for input:
dygram exec status <id> cat .dygram/executions/<id>/state.json | jq '.executionState.turnState'
Provide required response:
echo '{"response": "...", "tools": [...]}' | dygram e -i machine.dy --id <id>
Wrong Path Taken
Restart from beginning:
dygram exec rm <id> dygram e -i machine.dy --id <id> --force
Or start new execution:
dygram e -i machine.dy --id <id>-retry
Recording Playback Mismatch
Check recording content:
ls -la recordings/test-case/ cat recordings/test-case/turn-1.json | jq '.'
Verify machine hasn't changed:
# Compare machine hashes cat .dygram/executions/<id>/metadata.json | jq '.dyash'
Re-record if machine changed:
dygram e -i machine.dy --record recordings/test-case/ --id new --force
State Corruption
View error details:
cat .dygram/executions/<id>/state.json | jq '.status'
Force fresh start:
dygram exec rm <id> dygram e -i machine.dy --id <id> --force
Best Practices
1. Always Use Explicit IDs
# Good: Explicit ID for tracking dygram e -i machine.dy --id test-payment-success # Avoid: Auto-generated IDs are hard to track dygram e -i machine.dy
2. Create Recordings for Important Tests
# Record golden path dygram e -i machine.dy --record recordings/golden/ --id golden # Commit to git git add recordings/golden/ git commit -m "Add golden recording for regression testing"
3. Use Verbose Mode for Debugging
dygram e -i machine.dy --id debug --verbose
4. Check State Frequently
# After each significant turn dygram e -i machine.dy --id test dygram exec status test
5. Clean Up Test Executions
# After testing dygram exec rm test-01 dygram exec clean
6. Document Test Scenarios
# Create a test plan cat > TEST_PLAN.md <<'EOF' # Payment Workflow Tests ## Scenarios 1. Success path: recordings/payment-success/ 2. Invalid card: recordings/payment-invalid/ 3. Timeout: recordings/payment-timeout/ 4. Retry success: recordings/payment-retry/ ## Run Tests for scenario in success invalid timeout retry; do dygram e -i payment.dy \ --playback recordings/payment-$scenario/ \ --id test-$scenario done EOF
Integration with CI/CD
Local Development
# 1. Develop machine vim machines/workflow.dy # 2. Test interactively dygram e -i machines/workflow.dy \ --record recordings/workflow/ \ --id workflow-test # 3. Commit machine and recordings git add machines/workflow.dy recordings/workflow/ git commit -m "Add workflow machine with tests"
CI Configuration
# .github/workflows/test.yml name: Test DyGram Machines on: [push, pull_request] jobs: test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Install DyGram run: npm install -g dygram - name: Test All Machines run: | for recording in recordings/golden/*/; do machine=$(basename "$recording") echo "Testing: $machine" dygram execute --interactive \ "machines/$machine.dy" \ --playback "$recording" \ --id "ci-$machine" # Check result if ! dygram exec status "ci-$machine" | grep -q "complete"; then echo "FAILED: $machine" exit 1 fi echo "PASSED: $machine" done
Summary Checklist
When testing a machine, ensure you:
- Read and understand the machine definition
- Start with explicit execution ID
- Use
if creating test recordings--record - Step through execution observing state
- Provide intelligent responses when needed
- Check status frequently with
dygram exec status - Validate final state and results
- Verify recordings if created
- Clean up test executions when done
- Commit recordings for CI/CD if appropriate
See Also
- CLI Interactive Mode Guide:
docs/cli/interactive-mode.md - CLI Reference:
docs/cli/README.md - Agent:
(auto-loaded)dygram-test-responder - Examples:
directoryexamples/
Remember: You have intelligent reasoning - use it! Understand context, make semantic decisions, and test edge cases. Don't just pattern-match; think about what the machine is trying to accomplish.