Harness-evolver harness:setup

Use when the user wants to set up the evolver in their project, optimize an LLM agent, improve agent performance, or mentions evolver for the first time in a project without .evolver.json.

install
source · Clone the upstream repo
git clone https://github.com/raphaelchristi/harness-evolver
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/raphaelchristi/harness-evolver "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/setup" ~/.claude/skills/raphaelchristi-harness-evolver-harness-setup && rm -rf "$T"
manifest: skills/setup/SKILL.md
source content

/harness:setup

Set up the Harness Evolver v3 in a project. Explores the codebase, configures LangSmith, runs baseline evaluation.

Prerequisites

Check for LangSmith API key — it can be in the environment, the credentials file, or .env:

python3 -c "
import os, platform
key = os.environ.get('LANGSMITH_API_KEY', '')
if not key:
    creds = os.path.expanduser('~/Library/Application Support/langsmith-cli/credentials') if platform.system() == 'Darwin' else os.path.expanduser('~/.config/langsmith-cli/credentials')
    if os.path.exists(creds):
        for line in open(creds):
            if line.strip().startswith('LANGSMITH_API_KEY='):
                key = line.strip().split('=',1)[1].strip()
    if not key and os.path.exists('.env'):
        for line in open('.env'):
            if line.strip().startswith('LANGSMITH_API_KEY=') and not line.strip().startswith('#'):
                key = line.strip().split('=',1)[1].strip().strip('\"').strip(\"'\")
print('OK' if key else 'MISSING')
"

If

MISSING
: "Set your LangSmith API key:
export LANGSMITH_API_KEY=lsv2_pt_...
or run
npx harness-evolver@latest
to configure."

The tools auto-load the key from the credentials file, but the env var takes precedence.

Resolve Tool Path and Python

# Prefer env vars set by plugin hook; fallback to legacy npx paths
TOOLS="${EVOLVER_TOOLS:-$([ -d ".evolver/tools" ] && echo ".evolver/tools" || echo "$HOME/.evolver/tools")}"
EVOLVER_PY="${EVOLVER_PY:-$([ -f "$HOME/.evolver/venv/bin/python" ] && echo "$HOME/.evolver/venv/bin/python" || echo "python3")}"

Use

$EVOLVER_PY
instead of
python3
for ALL tool invocations. This ensures the venv with langsmith is used.

IMPORTANT: Never pass

LANGSMITH_API_KEY
inline in Bash commands. The key is loaded automatically by the SessionStart hook (from credentials file or environment) and by each Python tool's
ensure_langsmith_api_key()
. Passing it inline exposes it in the output. If the key is missing, tell the user to run
export LANGSMITH_API_KEY=lsv2_pt_...
instead.

Phase 1: Explore Project (automatic)

find . -maxdepth 3 -type f -name "*.py" -not -path "*/.venv/*" -not -path "*/node_modules/*" -not -path "*/__pycache__/*" | head -30

Monorepo detection: if the project root has multiple subdirectories with their own

main.py
or
pyproject.toml
, it's a monorepo. Use AskUserQuestion to ask WHICH app to optimize before proceeding — do NOT scan everything.

Look for:

  • Entry points: files with
    if __name__
    , or named
    main.py
    ,
    app.py
    ,
    agent.py
    ,
    graph.py
    ,
    pipeline.py
  • Existing LangSmith config:
    LANGCHAIN_PROJECT
    /
    LANGSMITH_PROJECT
    in env or
    .env
  • Existing test data: JSON files with inputs, CSV files, etc.
  • Dependencies:
    requirements.txt
    ,
    pyproject.toml

To identify the framework, read the entry point file and its immediate imports. The proposer agents will use Context7 MCP for detailed documentation lookup — you don't need to detect every library, just identify the main framework (LangGraph, CrewAI, OpenAI Agents SDK, etc.) from the imports you see.

Detect virtual environments — check for venvs in the project or parent directories:

# Check common venv locations
for venv_dir in .venv venv ../.venv ../venv; do
    if [ -f "$venv_dir/bin/python" ]; then
        echo "VENV_FOUND: $venv_dir/bin/python"
        break
    fi
done

If a venv is found, use it for the entry point instead of bare

python
. The agent's dependencies are likely installed there, not in the system Python. For example:
../.venv/bin/python agent.py {input}
instead of
python agent.py {input}
.

Identify the run command — how to execute the agent. Use

{input}
as a placeholder for the JSON file path:

  • .venv/bin/python main.py {input}
    — if venv detected (preferred)
  • python main.py {input}
    — agent reads JSON file from positional arg
  • python main.py --input {input}
    — agent reads JSON file from
    --input
    flag
  • python main.py --query {input_json}
    — agent receives inline JSON string

The runner writes

{"input": "user question..."}
to a temp
.json
file and replaces
{input}
with the file path. If the entry point already contains
--input
(without placeholder), the runner appends the file path as the next argument.

If no placeholder and no

--input
flag detected, the runner appends
--input <path> --output <path>
.

Phase 2: Confirm Configuration (interactive)

Present all detected configuration in one view with smart defaults and ask for confirmation.

Use AskUserQuestion:

{
  "questions": [{
    "question": "Here's the configuration for your project:\n\n**Entry point**: {command}\n**Framework**: {framework}\n**Python**: {venv_path or 'system python3'}\n**Optimization goals**: accuracy (correctness evaluator)\n**Test data**: generate 30 examples with AI\n\nDoes this look good?",
    "header": "Setup Configuration",
    "multiSelect": false,
    "options": [
      {"label": "Looks good, proceed", "description": "Use these settings and start setup"},
      {"label": "Customize goals", "description": "Choose different optimization goals"},
      {"label": "I have test data", "description": "Use existing JSON file or LangSmith project"},
      {"label": "Let me adjust everything", "description": "Change entry point, framework, goals, and data source"}
    ]
  }]
}

If "Looks good, proceed": Use defaults — goals=accuracy, data=generate 30 with testgen. Skip straight to Phase 3.

If "Customize goals": Ask the goals question, then proceed to Phase 3 with testgen as default data source.

Use AskUserQuestion:

{
  "questions": [{
    "question": "What do you want to optimize?",
    "header": "Goals",
    "multiSelect": true,
    "options": [
      {"label": "Accuracy", "description": "Correctness of outputs — LLM-as-judge evaluator"},
      {"label": "Latency", "description": "Response time — track and minimize"},
      {"label": "Token efficiency", "description": "Fewer tokens for same quality"},
      {"label": "Error handling", "description": "Reduce failures, timeouts, crashes"}
    ]
  }]
}

Map selections to evaluator configuration for setup.py.

Phase 2.5: Mode Selection

{
  "questions": [{
    "question": "Evolution mode?",
    "header": "Mode",
    "multiSelect": false,
    "options": [
      {"label": "light", "description": "20 examples, 2 proposers, ~2 min/iter. Good for testing."},
      {"label": "balanced (Recommended)", "description": "30 examples, 3 proposers, ~8 min/iter. Best trade-off."},
      {"label": "heavy", "description": "50 examples, 5 proposers, ~25 min/iter. Maximum quality."}
    ]
  }]
}

Pass selection to setup.py as

--mode light|balanced|heavy
.

The mode determines testgen count:

  • light
    : generate 20 examples
  • balanced
    : generate 30 examples (default, current behavior)
  • heavy
    : generate 50 examples

If "I have test data": Ask the data source question, then proceed to Phase 3 with accuracy as default goal.

Use AskUserQuestion with preview:

{
  "questions": [{
    "question": "Where should test inputs come from?",
    "header": "Test data",
    "multiSelect": false,
    "options": [
      {
        "label": "Import from LangSmith",
        "description": "Use real production traces as test inputs",
        "preview": "## Import from LangSmith\n\nFetches up to 100 recent traces from your production project.\nPrioritizes traces with negative feedback.\nCreates a LangSmith Dataset with real user inputs.\n\nRequires: an existing LangSmith project with traces."
      },
      {
        "label": "I have a file",
        "description": "Point to an existing file with test inputs",
        "preview": "## Provide Test Data\n\nSupported formats:\n- JSON array of inputs\n- JSON with {\"inputs\": {...}} objects\n- CSV with input columns\n\nExample:\n```json\n[\n  {\"input\": \"What is Python?\"},\n  {\"input\": \"Explain quantum computing\"}\n]\n```"
      }
    ]
  }]
}

If "Import from LangSmith": discover projects and ask which one (same as v2 Phase 1.9). If "I have a file": ask for file path.

If "Let me adjust everything": Ask all three original questions in sequence — confirm detection (entry point, framework, run command), then goals, then data source — using the question formats above.

Phase 3: Run Setup

Build the setup.py command based on all gathered information:

$EVOLVER_PY $TOOLS/setup.py \
    --project-name "{project_name}" \
    --entry-point "{run_command}" \
    --framework "{framework}" \
    --goals "{goals_csv}" \
    ${DATASET_FROM_FILE:+--dataset-from-file "$DATASET_FROM_FILE"} \
    ${DATASET_FROM_LANGSMITH:+--dataset-from-langsmith "$DATASET_FROM_LANGSMITH"} \
    ${PRODUCTION_PROJECT:+--production-project "$PRODUCTION_PROJECT"}

If "Generate from code" was selected AND no test data file exists, first spawn the testgen agent to generate inputs, then pass the generated file to setup.py.

Phase 4: Generate Test Data (if needed)

If testgen is needed, spawn it:

Agent(
  subagent_type: "harness-testgen",
  description: "TestGen: generate test inputs",
  prompt: |
    <objective>
    Generate 30 diverse test inputs for this project.
    Write them as a JSON array to test_inputs.json.
    </objective>

    <files_to_read>
    {all .py files discovered in Phase 1}
    </files_to_read>

    <output>
    Create test_inputs.json with format:
    [{"input": "..."}, {"input": "..."}, ...]
    </output>
)

Then pass

--dataset-from-file test_inputs.json
to setup.py.

Phase 5: Report

Setup complete!
  Project: evolver-{name}
  Dataset: {name}-eval-v1 ({N} examples)
  Evaluators: {list}
  Baseline score: {score}
  Config: .evolver.json

Next: run /harness:evolve to start optimizing.

Gotchas

  • If
    .evolver.json
    already exists, ask before overwriting.
  • If the agent needs a venv, the run command should activate it:
    cd {dir} && .venv/bin/python main.py
  • No project venv detected: Before building the entry_point, check if
    .venv/bin/python
    or
    venv/bin/python
    exists in the project directory. If NOT, warn the user:
    WARNING: No Python venv found in this project (.venv/ or venv/).
    The entry_point should use the project's own Python, not ~/.evolver/venv/bin/python.
    Create one first: python3 -m venv .venv && .venv/bin/pip install -r requirements.txt
    
    Do NOT use
    ~/.evolver/venv/bin/python
    as the entry_point — that's for evolver tools only. Using it pollutes the evolver venv with agent dependencies.
  • If LangSmith connection fails, check API key and network. The
    ensure_langsmith_api_key()
    function validates key format and rejects dummy/test keys with a warning.
  • Eval concurrency defaults to 3 (runs 3 examples in parallel). If the agent can't handle parallel execution (writes to shared files, uses a fixed port, holds a DB lock), set
    eval_concurrency: 1
    in
    .evolver.json
    after setup.
  • Companion plugin: For full proposer observability, recommend installing
    langsmith-tracing
    from
    langchain-ai/langsmith-claude-code-plugins
    . Each proposer's file reads, edits, and commits become visible in LangSmith.