Claude-skill-registry libeval
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/libeval" ~/.claude/skills/majiayu000-claude-skill-registry-libeval && rm -rf "$T"
manifest:
skills/data/libeval/SKILL.mdsource content
libeval Skill
When to Use
- Evaluating RAG agent response quality
- Measuring retrieval recall and precision
- Running automated quality assessments
- Benchmarking agent performance over time
Key Concepts
Evaluator: Main orchestrator that runs test cases through the agent and collects metrics.
CriteriaEvaluator: Uses LLM-as-judge to score responses against defined criteria and rubrics.
RecallEvaluator: Measures how well the retrieval system returns relevant documents.
TraceEvaluator: Analyzes execution traces for performance and correctness.
Usage Patterns
Pattern 1: Run evaluation suite
import { Evaluator } from "@copilot-ld/libeval"; const evaluator = new Evaluator(config); const results = await evaluator.run(testCases); console.log(results.summary);
Pattern 2: Criteria-based evaluation
import { CriteriaEvaluator } from "@copilot-ld/libeval"; const criteria = new CriteriaEvaluator(llmClient); const score = await criteria.evaluate(response, rubric);
Integration
Configured via config/eval.yml. Run via
make eval. Uses libllm for
LLM-as-judge.