Claude-skill-registry libeval

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/libeval" ~/.claude/skills/majiayu000-claude-skill-registry-libeval && rm -rf "$T"
manifest: skills/data/libeval/SKILL.md
source content

libeval Skill

When to Use

  • Evaluating RAG agent response quality
  • Measuring retrieval recall and precision
  • Running automated quality assessments
  • Benchmarking agent performance over time

Key Concepts

Evaluator: Main orchestrator that runs test cases through the agent and collects metrics.

CriteriaEvaluator: Uses LLM-as-judge to score responses against defined criteria and rubrics.

RecallEvaluator: Measures how well the retrieval system returns relevant documents.

TraceEvaluator: Analyzes execution traces for performance and correctness.

Usage Patterns

Pattern 1: Run evaluation suite

import { Evaluator } from "@copilot-ld/libeval";

const evaluator = new Evaluator(config);
const results = await evaluator.run(testCases);
console.log(results.summary);

Pattern 2: Criteria-based evaluation

import { CriteriaEvaluator } from "@copilot-ld/libeval";

const criteria = new CriteriaEvaluator(llmClient);
const score = await criteria.evaluate(response, rubric);

Integration

Configured via config/eval.yml. Run via

make eval
. Uses libllm for LLM-as-judge.