Skill-optimizer skill-optimizer
git clone https://github.com/fastxyz/skill-optimizer
T=$(mktemp -d) && git clone --depth=1 https://github.com/fastxyz/skill-optimizer "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SKILL" ~/.claude/skills/fastxyz-skill-optimizer-skill-optimizer && rm -rf "$T"
SKILL/SKILL.mdskill-optimizer
Benchmark your SDK / CLI / MCP / prompt docs against multiple LLMs, measure whether they call the right actions with the right arguments, and iteratively rewrite your guidance until a quality floor is met across every model.
Context Detection
Before doing anything, figure out where you are:
-
Look for
(in CWD or parent directories). If found, you are in a configured target project. Use that file path asskill-optimizer.json
in all commands below.<config-path> -
Look for
and asrc/cli.ts
withpackage.json
. If found, you are in the optimizer repo itself. You can use dev commands directly ("name": "skill-optimizer"
,npm run build
,npm test
). To benchmark a target, either use the mock repos innpx tsx src/cli.ts
or pointmock-repos/
at an external project's config.--config -
Neither found — you are in an unconfigured target project. Read
to scaffold a config before proceeding.references/setup.md
Quick Reference
| Task | Command |
|---|---|
| Init config (interactive) | |
| Init (non-interactive, explicit surface) | |
| Init (auto-detect surface, non-interactive) | |
| Import CLI commands | |
| Import with output file | |
| Import (overwrite existing) | |
| Import (binary scrape) | |
| Diagnose config | |
| Diagnose (skip code discovery) | |
| Diagnose (verify model access) | |
| Auto-fix config | |
| Dry run (no LLM calls) | |
| Run benchmark | |
| Run (filter by model tier) | |
| Generate tasks only | |
| Run optimizer | |
| Compare two runs | |
<config-path> is the path to your skill-optimizer.json — typically ./.skill-optimizer/skill-optimizer.json after running init, or wherever you placed it.
What Do You Need?
Read the reference file that matches your current goal:
| Goal | Reference |
|---|---|
| Set up skill-optimizer for a project (first time) | Read |
| Run a benchmark or understand results | Read |
| Automatically optimize a SKILL.md | Read |
| Understand config options | Read |
If you are in an unconfigured project (context detection case 3), start with
references/setup.md.
Command Details
init
— scaffold a skill-optimizer config
initThe
init command has three modes:
-
Interactive wizard (default):
— prompts you through setup. Optionally passnpx skill-optimizer init [surface]
,cli
,sdk
, ormcp
as a positional argument to pre-select the surface type.prompt -
Non-interactive with explicit surface:
— accepts all defaults for the named surface without prompting.npx skill-optimizer init <surface> --yes -
Auto-detect + non-interactive (fully automated, zero prompts):
— inspects the current directory to detect the surface type, then applies defaults without prompting. This is the right choice when the task says "initialize without prompts", "fully automated setup", or "detect and scaffold" — especially when the surface type isn't stated.npx skill-optimizer init --auto --yes
Key parameters:
| Parameter | Meaning | Notes |
|---|---|---|
| Positional: , , , or | Optional; omit when using or running the interactive wizard |
| Auto-detect surface type from CWD | Detects surface; still prompts unless combined with |
| Accept all defaults without prompting | Alone: needs explicit surface. With : fully non-interactive. |
| Load answers from a JSON file | For CI pipelines with a pre-built answers file |
Critical:
--auto and --yes have independent effects. --yes alone still requires a surface name. --auto alone still opens the interactive wizard (pre-filled). Only --auto --yes together produces a completely non-interactive run.
# Fully automated: detect surface + accept defaults (no prompts at all) npx skill-optimizer init --auto --yes # Explicit surface, no prompts npx skill-optimizer init cli --yes # Interactive wizard for MCP surface npx skill-optimizer init mcp
doctor
— diagnose your configuration
doctorThe base command validates your
skill-optimizer.json and checks that discovered surfaces are intact. Two optional flags activate additional checks that are off by default:
— skip live code discovery (tree-sitter analysis). Use this when you want to validate config and manifests without requiring the project source to be present, or to speed up CI checks. Do not confuse with--static
— the correct flag is--no-discovery
.--static
— ping each configured model to verify API credentials and routing are working. Use this when you suspect auth issues or want to confirm model availability before a benchmark run. The flag is--check-models
, not--check-models
or--ping
.--verify-models
These flags are independent and can be combined:
npx skill-optimizer doctor --config ./skill-optimizer.json --static npx skill-optimizer doctor --config ./skill-optimizer.json --check-models npx skill-optimizer doctor --config ./skill-optimizer.json --static --check-models
import-commands
— extract CLI surface from source or binary
import-commandsDiscovery mode is determined by whether
--scrape is present:
- Source mode (default):
points to a TypeScript/JavaScript file (e.g.--from
). Tree-sitter parses commands statically../src/cli.ts - Scrape mode: Add
to invoke the binary named in--scrape
and walk its--from
output.--help
Key parameters:
| Parameter | Meaning | Notes |
|---|---|---|
| File path or binary name to import from | Required |
| Write discovered commands to this JSON file | Optional; without it, output goes to stdout |
| Overwrite file if it already exists | Required when the output file exists; without it the command refuses to overwrite |
| Invoke as a binary and parse output | Enables scrape mode |
| Max subcommand depth to explore during scrape | Only meaningful with ; the flag is , not |
Output goes to the
--out file — do not use shell redirection (>) to capture output because the tool writes structured JSON with metadata that is not suitable for piping.
# Source import, write to file (safe to re-run with --force) npx skill-optimizer import-commands --from ./src/cli.ts --out ./commands.json --force # Scrape a binary, limit depth to 3 levels npx skill-optimizer import-commands --from my-app --scrape --depth 3
run
— execute the benchmark
runFilterable via:
— only run models whose tier matches. Valid values:--tier <name>
,flagship
,mid
. The flag isbudget
, not--tier
.--model-tier
— run a single specific model.--model <id>
— generate prompts and tasks without making LLM calls.--dry-run
npx skill-optimizer run --config ./skill-optimizer.json --tier flagship npx skill-optimizer run --config ./skill-optimizer.json --tier mid
Key Concepts
Surfaces — The callable interface of your project: SDK methods, CLI commands, MCP tools, or prompt templates. Skill-optimizer discovers these via tree-sitter code analysis, manifest files, or markdown parsing.
Static evaluation — Benchmark evaluation never executes generated code. Actions are extracted from model responses via pattern matching and compared structurally against expected calls. This makes benchmarks safe and repeatable.
Verdict gates — Two thresholds must both pass for a benchmark to receive a PASS verdict:
perModelFloor (each model individually meets a minimum score) and targetWeightedAverage (the weighted mean across all models meets a target). A single model below the floor fails the entire run.
Safety boundary — The optimizer never modifies your original SKILL.md. It creates versioned copies in
.skill-optimizer/skill-v{N}.md and only accepts mutations that improve scores without dropping any model below the floor. It does not modify tracked source files, but the generated artifacts appear under .skill-optimizer/ — add that directory to your .gitignore.
LLM routing — By default (
format: "pi"), all benchmark calls route through OpenRouter and need OPENROUTER_API_KEY. You can also call providers directly: format: "anthropic" uses the Anthropic API directly (ANTHROPIC_API_KEY), and format: "openai" uses the OpenAI API directly (OPENAI_API_KEY), with optional Codex browser-login auth via authMode: "codex". The model ID prefix must match the format — see references/config.md for the full mapping.