Medox add-eval-case
Add a new E2E test case to tests/e2e/prompts.yaml for LangSmith evaluation. Use when adding interaction pairs to test, covering new ANSM drug classes, or rebalancing eval coverage.
install
source · Clone the upstream repo
git clone https://github.com/spideystreet/medox
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/spideystreet/medox "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/add-eval-case" ~/.claude/skills/spideystreet-medox-add-eval-case && rm -rf "$T"
manifest:
.claude/skills/add-eval-case/SKILL.mdsource content
Skill: Add an E2E Eval Test Case
Steps
-
Identify the interaction pair — substance A + substance B, with ANSM level:
(CI) orContre-indication
(AD) →Association déconseilléeexpect_warn: true
(PE) orPrécaution d'emploi
(APEC) →À prendre en compteexpect_warn: false
-
Write a realistic clinical scenario (in French, pharmacist/physician perspective):
- Use DCI names (not brand names) in the prompt
- Ground it in a plausible indication (e.g. patient greffé sous ciclosporine)
- Keep it concise — 2-3 sentences
-
Add the case to
:tests/e2e/prompts.yaml# <ANSM level>: <mechanism> — DB pair: <SUBSTANCE_A> + <SUBSTANCE_B> (<CI|AD|PE|APEC>) - id: <substance_a>_<substance_b> prompt: > <Clinical scenario in French.> expect_warn: true # or false expect_in: - "⚠️" # only if expect_warn: true - "<substance name as it appears in French DCI>" expect_not: - "<unrelated substance that should not appear>" -
Choose
carefully — must be a substance meaningfully different, not a trivial variation. Used as false-positive guard.expect_not -
Run the eval to validate the new case passes:
uv run dotenv -f .env run -- python scripts/run_eval.py
Coverage targets
Maintain balance across:
cases (CI + AD) — currently: ibuprofene_methotrexate, ciclosporine_simvastatineexpect_warn: true
cases (PE + true negatives) — currently: amiodarone_simvastatine, paracetamol_amoxicilline, generiques_dolipraneexpect_warn: false- Tool path diversity: interaction lookup, generic lookup, simple drug search
Rules
format:id
in lowercase, underscores, no accents<substance_a>_<substance_b>- Comment above each case must include the ANSM level and DB pair (for traceability)
terms must match what the agent actually outputs (test with a manual run first)expect_in- Never add a case you haven't verified exists in the ANSM thésaurus