Skillshare skillshare-cli-e2e-test
git clone https://github.com/runkids/skillshare
T=$(mktemp -d) && git clone --depth=1 https://github.com/runkids/skillshare "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.skillshare/skills/skillshare-cli-e2e-test" ~/.claude/skills/runkids-skillshare-skillshare-cli-e2e-test && rm -rf "$T"
.skillshare/skills/skillshare-cli-e2e-test/SKILL.mdRun isolated E2E tests in devcontainer. $ARGUMENTS specifies runbook name or "new".
Flow
Phase 0: Environment Check
-
Confirm devcontainer is running and get container ID:
CONTAINER=$(docker compose -f .devcontainer/docker-compose.yml ps -q skillshare-devcontainer)- If empty → prompt user:
docker compose -f .devcontainer/docker-compose.yml up -d - Ensure
is set for all subsequentCONTAINER
calls.docker exec
- If empty → prompt user:
-
Confirm Linux binary is available:
docker exec $CONTAINER bash -c \ '/workspace/.devcontainer/ensure-skillshare-linux-binary.sh && ss version' -
Confirm mdproof is installed:
docker exec $CONTAINER /workspace/.devcontainer/ensure-mdproof.shThis auto-installs from GitHub release, or falls back to
(local dev binary)./workspace/bin/mdproof -
Check for lessons learned from previous runs:
test -f /workspace/.mdproof/lessons-learned.md && cat /workspace/.mdproof/lessons-learned.mdIf the file exists, read it before writing or debugging runbooks — it contains known gotchas and assertion patterns.
Phase 1: Detect Scope
-
Preview all available runbooks via the container:
docker exec $CONTAINER mdproof --dry-run --report json /workspace/ai_docs/tests/This returns JSON with every runbook's steps, commands, and expected assertions — no manual markdown parsing needed. Use this to understand what each runbook covers.
-
Identify recent changes (unstaged + recent commits):
git diff --name-only HEAD~3 -
Match changes to relevant runbooks (compare changed file paths against step commands in the JSON output).
Phase 2: Select Tests
Prompt user (via AskUserQuestion):
- Option A: Run existing runbook (list all available + mark those related to recent changes)
- Option B: Auto-generate new test script based on recent changes
- Option C: If $ARGUMENTS specifies a runbook, skip to Phase 3
Phase 3: Prepare & Execute
Running existing runbook:
-
Create isolated environment with auto-initialization:
ENV_NAME="e2e-$(date +%Y%m%d-%H%M%S)" # Use --init to automatically run 'ss init -g' with all targets docker exec $CONTAINER ssenv create "$ENV_NAME" --init -
Execute the entire runbook via mdproof inside the container:
docker exec $CONTAINER env SKILLSHARE_DEV_ALLOW_WORKSPACE_PROJECT=1 \ ssenv enter "$ENV_NAME" -- \ mdproof --report json \ /workspace/ai_docs/tests/<runbook_file>.mdmdproof executes each step (
) in the ssenv-isolated HOME, then returns structured JSON:bash -c <command>{ "version": "1", "runbook": "<runbook_file>.md", "duration_ms": 12345, "summary": { "total": 7, "passed": 5, "failed": 1, "skipped": 1 }, "steps": [ { "step": { "number": 1, "title": "...", "command": "...", "expected": ["..."] }, "status": "passed", // "passed" | "failed" | "skipped" "exit_code": 0, "stdout": "...", "stderr": "..." } ] } -
Analyze the JSON output:
- All passed → proceed to Phase 4
- Any failed → filter for failures only (full JSON can be too large for terminal output):
mdproof --report json runbook.md 2>&1 | jq '{ summary: .summary, failed: [.steps[] | select(.status == "failed") | { step: .step.number, title: .step.title, exit_code: .exit_code, failed_assertions: [.assertions[]? | select(.matched == false) | .pattern], stderr: (.stderr // "" | .[0:200]) }] }' - Skipped steps (executor=
) → these need manual verification, run them individually:manualdocker exec $CONTAINER env SKILLSHARE_DEV_ALLOW_WORKSPACE_PROJECT=1 \ ssenv enter "$ENV_NAME" -- <command from step.command>
-
For failed steps, debug individually using manual docker exec (same as before):
docker exec $CONTAINER env SKILLSHARE_DEV_ALLOW_WORKSPACE_PROJECT=1 \ ssenv enter "$ENV_NAME" -- bash -c '<failed step command>'- Prefer
+--json
for assertions — see the JSON Reference belowjq
- Prefer
Generating new runbook:
- Read
to find changed files ingit diff HEAD~3
orcmd/skillshare/internal/ - Read changed files to understand new/modified functionality
- Validate all CLI flags before writing — for every
in the runbook:ss <command> <flag>- Grep
for the exact flag string (e.g.cmd/skillshare/<command>.go
)"--force" - Run
inside container if neededss <command> --help - Common mistakes to avoid:
→ wrong, useuninstall --yes
/--force-f
→ wrong,init --target <name>
has noinit
flag--target
has a completely separate flag set from globalinit -p
— only supportsinit
,--targets
,--discover
,--select
,--mode
. Global-only flags like--dry-run
,--no-copy
,--no-skill
,--no-git
,--all-targets
do NOT exist in project mode--force- Audit custom rules: disable by rule ID (e.g.
,prompt-injection-0
), NOT pattern name (e.g.prompt-injection-1
). Rule IDs are inprompt-injectioninternal/audit/rules.yaml
- Grep
- Generate new runbook to
, following existing conventions:ai_docs/tests/<slug>_runbook.md- YAML-free, pure Markdown
- Has Scope, Environment, Steps (each with bash + Expected), Pass Criteria
- Use
assertions in Expected blocks for JSON commands — e.g.jq:
. This is a native mdproof assertion type, NOT a bash- jq: .extras | length == 1
pipejq - Use
+--json
in bash for inline verification within multi-command stepsjq -e - Config idempotency — never bare
; always prependcat >> config.yaml
to remove existing section first, or use CLI commands (sed -i '/^section:/,$d'
,ss extras init
) that handle duplicatesss extras remove --force - Check
for project-level config (build, setup, teardown, step_setup, timeout) that affects all runbooksai_docs/tests/runbook.json - Check
for known assertion patterns and gotchas.mdproof/lessons-learned.md
- Run the runbook quality checklist (see below) before executing
- Then execute the new runbook (same flow as above)
Phase 4: Cleanup & Report
-
Ask user before cleanup (via AskUserQuestion):
- Option A: Delete ssenv environment now
- Option B: Keep for manual debugging (print env name for later
)ssenv delete
-
If user chose Option A:
docker exec $CONTAINER ssenv delete "$ENV_NAME" --force -
Output summary (derived from the runbook JSON output):
── E2E Test Report ── Runbook: {runbook name} Env: {ENV_NAME} Duration: {duration_ms}ms Step 1: {title} PASS Step 2: {title} PASS Step 3: {title} FAIL ← exit_code={N}, stderr: {error detail} ... Result: {passed}/{total} passed ({skipped} skipped)All values come directly from mdproof's JSON output —
,summary.passed
,summary.total
,steps[].step.title
.steps[].status -
If any FAIL → distinguish between runbook bug vs real bug:
- Runbook bug: wrong flag, wrong file path, stale assertion → fix runbook, re-run step
- Real bug: CLI misbehavior → analyze cause, provide fix suggestions
-
Retrospective — ask user (via AskUserQuestion):
Did you encounter any friction during this test run that the skill or runbook could handle better?
- Option A: Yes, improve e2e skill — review test friction (wrong flags, stale assertions, missing checklist items, unclear instructions), then update SKILL.md and/or runbooks
- Option B: Yes, but only fix the runbook — fix the specific runbook without changing the skill itself
- Option C: No, skip
Improvement targets:
- SKILL.md: add new checklist items, common-mistake examples, or rule clarifications learned from this run
- Runbooks: fix stale assertions (e.g. config.yaml → registry.yaml), wrong flags, outdated paths
- Both: when a systemic issue (e.g. a refactor changed file locations) affects both the skill's guidance and existing runbooks
Runbook Quality Checklist
Before executing a newly generated runbook, verify:
- All CLI flags exist — every
was grep-verified against sourcess <cmd> --flag -
interaction — if runbook has--init
, account forss init
already initializing (addssenv create --init
to re-init, or skip init step)--force -
creates default extras —--init
creates assenv create --init
extra by default. Runbooks that assume an empty extras list must add cleanup first:rules
+ss extras remove rules --force -g 2>/dev/null || truerm -rf ~/.claude/rules - Correct confirmation flags —
usesuninstall
(not--force
);--yes
re-run needs no flag (just fails gracefully)init - Skill data in registry.yaml — assertions about installed skills check
, NOTregistry.yaml
; config.yaml should never containconfig.yamlskills: - File existence timing —
is only created after first install/reconcile, not onregistry.yamlss init - Project mode paths — project commands use
not.skillshare/~/.config/skillshare/ - Project init flags —
only supportsinit -p
,--targets
,--discover
,--select
,--mode
; global-only flags (--dry-run
,--no-copy
,--no-skill
,--no-git
,--all-targets
) are not available--force - Audit rule IDs — custom rules in
use rule IDs (e.g.audit-rules.yaml
), not pattern names (e.g.prompt-injection-0
). Verify IDs againstprompt-injectioninternal/audit/rules.yaml - Use
for assertions — if the command supports--json
, use it with--json
instead of grepping human-readable output. Text output changes between versions; JSON structure is stablejq - Expected = actual substrings, NOT descriptions — the runbook assertion engine does case-insensitive substring matching. Write
or- Installed
, NOT- cangjie-docs-navigator
or- Install completes without error
. Negation: use- Output contains at least one skill
prefix (e.g.Not <substring>
)- Not cangjie-docs-navigator - Skill name ≠ repo name — after
, the actual skill name may differ from the repo name (e.g. reposs install <repo>
→ skillcangjie-docs-mcp
). Always verify the installed skill name viacangjie-docs-navigator
before writing uninstall/check stepsss list -
cleanup — ssenv only isolates/tmp/
;$HOME
is shared across runs. Any step using/tmp/
must start with/tmp/<path>
to avoid stale state from previous runsrm -rf /tmp/<path> -
writes through —echo > symlink
whereecho "content" > path
is a symlink writes to the symlink's target, it does NOT replace the symlink with a real file. To create a local (non-managed) file at a symlinked path: either use a different filename, orpath
the symlink first thenrmecho -
is not idempotent — appending to config files (cat >>
) will duplicate sections on re-run. Prefercat >> config.yaml
(which validates duplicates) or full file replacement overss extras init
when possiblecat >> - Extras source path layout — extras use
(not the legacy flat path~/.config/skillshare/extras/<name>/
). Symlink assertions must include~/.config/skillshare/<name>/
in the path regex (e.g.extras/
)regex: skillshare/extras/rules/tdd\.md - Prefer
overjq:
— for JSON output validation, use mdproof's nativepython3 -c
assertion type (e.g.jq:
) instead of piping to- jq: .extras | length == 1
. It's one line vs 10, and mdproof handles failure reporting automaticallypython3 -c - Config append idempotency — when appending YAML sections with
, always prependcat >>
to remove existing section. Or prefer CLI commands (sed -i '/^section_key:/,$d'
,ss extras init
) over manual config editingss extras remove --force - Check lessons-learned — read
before writing new runbooks for known gotchas and proven assertion patterns.mdproof/lessons-learned.md
Runbook Assertion Types
mdproof supports 6 assertion types under
Expected: blocks. Use the most specific type for each check:
| Type | Syntax | When to use | Example |
|---|---|---|---|
| Substring | plain text | Simple output check | |
| Negated | / prefix | Verify absence | |
| Exit code | | Every step should have this | |
| Regex | prefix | Pattern matching | |
| jq | prefix | JSON output (preferred) | |
| Snapshot | prefix | Stable output comparison | |
best practices:jq:
# Simple field check - jq: .name == "rules" # Array length - jq: .extras | length == 3 # Sorted array comparison - jq: [.extras[].name] | sort | . == ["a","b","c"] # Null/missing field (omitempty) - jq: .extras == null # Nested access - jq: .[0].targets[0].status == "synced" # Boolean - jq: .source_exists == true
Rules
- Always execute inside devcontainer — use
, never run CLI on hostdocker exec - Always use
for HOME isolation — don't pollute container default HOMEssenv - Always create fresh ssenv environments — never reuse an environment from a previous run; stale config/state causes confusing cascade failures (e.g. duplicate YAML keys, "already exists" errors)
- ssenv only isolates
—$HOME
,/tmp/
, and other system paths are shared across all environments. Runbook steps using/var/
must include/tmp/
cleanup at the startrm -rf - Verify every step — never skip Expected checks
- Don't abort on failure — record FAIL, continue to next step, summarize at end
- Ask before cleanup — Phase 4 must prompt user before deleting ssenv environment
=ss
— same binary in runbooksskillshare
= ssenv-isolated HOME —~
auto-setsssenv enterHOME- Use
— simplify setup by using--initssenv create <name> --init
already runs init — the env is pre-initialized; runbook steps calling--init
again will fail unless the step explicitly resets state firstss init
ssenv Quick Reference
| Command | Purpose |
|---|---|
| Show shortcuts and usage |
| List isolated environments |
| Create + enter isolated shell (interactive) |
| Enter existing isolated shell (interactive) |
| Leave isolated context |
| Run single command in isolation (automation) |
- For interactive debugging:
thenssnew <env>
when doneexit - For deterministic automation: prefer
one-linersssenv enter <env> -- <command>
Test Command Policy
When running Go tests inside devcontainer (not via runbook):
# ssenv changes HOME, so always cd to /workspace first for Go test commands cd /workspace go build -o bin/skillshare ./cmd/skillshare SKILLSHARE_TEST_BINARY="$PWD/bin/skillshare" go test ./tests/integration -count=1 go test ./...
Always run in devcontainer unless there is a documented exception. Note:
ssenv enter changes HOME, which may affect Go module resolution — always cd /workspace before running go test or go build.
--json
Quick Reference
--jsonMost commands support
--json for structured output, making assertions more reliable than text matching.
| Command | | Notes |
|---|---|---|
| | Skills, targets, sync status |
| / | All skills with metadata |
| | Configured targets |
| | Implies (skip prompts) |
| | Implies (skip prompts) |
| | Implies (skip prompts) |
| | Update availability per repo |
| | Update results per skill |
| | Per-file diff details |
| | Sync stats per target |
| | Also accepts (deprecated alias) |
| | Raw JSONL (one object per line) |
Key behaviors:
that implies--json
/--force
skips interactive prompts — safe for automation--all- Output goes to stdout only (progress/spinners suppressed)
prefersaudit
;--format json
still works but is the deprecated form--json
outputs JSONL (newline-delimited), not a JSON arraylog --json
Assertion Patterns with jq
jq# Count installed skills ss list --json | jq 'length' # Check a specific skill exists ss list --json | jq -e '.[] | select(.name == "my-skill")' # Verify target is configured ss target list --json | jq -e '.[] | select(.name == "claude")' # Assert no critical audit findings ss audit --format json | jq -e '.summary.critical == 0' # Check update availability ss check --json | jq -e '.tracked_repos | length > 0' # Verify sync succeeded (zero errors) ss sync --json | jq -e '.errors == 0' # Install and verify result ss install https://github.com/user/repo --json | jq -e '.skills | length > 0'
When a
jq -e expression fails (exit code 1 = false, 5 = no output), the step FAILs — no ambiguous text matching needed.
Container Command Templates
# Single command docker exec $CONTAINER ssenv enter "$ENV_NAME" -- ss status # JSON assertion (preferred for verification) docker exec $CONTAINER ssenv enter "$ENV_NAME" -- bash -c ' ss list --json | jq -e ".[] | select(.name == \"my-skill\")" ' # Multi-line compound command (use bash -c) — global mode flags docker exec $CONTAINER ssenv enter "$ENV_NAME" -- bash -c ' ss init --no-copy --all-targets --no-git --no-skill ss status ' # Project mode init (different flag set!) docker exec $CONTAINER env SKILLSHARE_DEV_ALLOW_WORKSPACE_PROJECT=1 \ ssenv enter "$ENV_NAME" -- bash -c ' cd /tmp/test-project && ss init -p --targets claude ' # Check files (HOME is set to isolated path by ssenv) docker exec $CONTAINER ssenv enter "$ENV_NAME" -- bash -c ' cat ~/.config/skillshare/config.yaml ' # With environment variables docker exec $CONTAINER ssenv enter "$ENV_NAME" -- bash -c ' TARGET=~/.claude/skills ls -la "$TARGET" ' # Go tests (must cd /workspace because ssenv changes HOME) docker exec $CONTAINER ssenv enter "$ENV_NAME" -- bash -c ' cd /workspace go test ./internal/install -run TestParseSource -count=1 '
Relationship with /mdproof
Skill
/mdproofThis skill (
/cli-e2e-test) and the /mdproof skill are complementary, not competing:
| Concern | | |
|---|---|---|
| Scope | Skillshare project-specific E2E | General-purpose runbook authoring |
| Infrastructure | Devcontainer, ssenv, binary build | None — format and assertions only |
| Config | (build, setup, teardown) | Assertion types, snapshot, coverage |
| Lessons | Checklist items, CLI flag gotchas | |
| When | Running or debugging a test | Writing or improving a runbook |
How they work together
- Writing a new runbook → invoke
first for format guidance (assertion types,/mdproof
patterns, snapshot usage), thenjq:
to execute it in isolation/cli-e2e-test - Improving existing runbooks → invoke
for assertion quality review (python3 → jq:, idempotency), then/mdproof
to verify changes pass/cli-e2e-test - Debugging failures →
Phase 3 step 4 handles manual docker exec;/cli-e2e-test
lessons-learned captures recurring patterns/mdproof - After a test run →
Self-Learning section guides recording discoveries to/mdproof.mdproof/lessons-learned.md
Rule of thumb
- Need to run tests or debug in devcontainer? →
/cli-e2e-test - Need to write assertions or improve runbook quality? →
/mdproof - User says "run extras E2E" →
/cli-e2e-test - User says "improve runbook assertions" →
then/mdproof
to verify/cli-e2e-test