Skills llm-evaluator

Name: llm-evaluator
Author: openclaw

LLM-as-a-Judge evaluation system using Langfuse. Score AI outputs on relevance, accuracy, hallucination, and helpfulness. Backfill scoring on historical traces. Uses GPT-5-nano for cost-efficient judging. Use when evaluating AI quality, building evals, or monitoring output accuracy.

install

source · Clone the upstream repo

git clone https://github.com/openclaw/skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/aiwithabidi/llm-evaluator" ~/.claude/skills/openclaw-skills-llm-evaluator-28ce92 && rm -rf "$T"

OpenClaw · Install into ~/.openclaw/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/aiwithabidi/llm-evaluator" ~/.openclaw/skills/openclaw-skills-llm-evaluator-28ce92 && rm -rf "$T"

manifest: skills/aiwithabidi/llm-evaluator/SKILL.md

LLM Evaluator ⚖️

LLM-as-a-Judge evaluation system powered by Langfuse. Uses GPT-5-nano to score AI outputs.

When to Use

Evaluating quality of search results or AI responses
Scoring traces for relevance, accuracy, hallucination detection
Batch scoring recent unscored traces
Quality assurance on agent outputs

Usage

# Test with sample cases
python3 {baseDir}/scripts/evaluator.py test

# Score a specific Langfuse trace
python3 {baseDir}/scripts/evaluator.py score <trace_id>

# Score with specific evaluator only
python3 {baseDir}/scripts/evaluator.py score <trace_id> --evaluators relevance

# Backfill scores on recent unscored traces
python3 {baseDir}/scripts/evaluator.py backfill --limit 20

Evaluators

Evaluator	Measures	Scale
relevance	Response relevance to query	0–1
accuracy	Factual correctness	0–1
hallucination	Made-up information detection	0–1
helpfulness	Overall usefulness	0–1

Credits

Built by M. Abidi | agxntsix.ai YouTube | GitHub Part of the AgxntSix Skill Suite for OpenClaw agents.

📅 Need help setting up OpenClaw for your business? Book a free consultation