Claude-skill-registry batch-quality

install

source · Clone the upstream repo

git clone https://github.com/majiayu000/claude-skill-registry

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/batch-quality" ~/.claude/skills/majiayu000-claude-skill-registry-batch-quality && rm -rf "$T"

manifest: skills/data/batch-quality/SKILL.md

Batch Quality Skill

Prevent wasted LLM calls by validating quality BEFORE running full batch operations.

What This Skill Actually Does

Unlike simple file-existence checks, this skill:

Actually runs LLM on N samples using scillm
Validates JSON response structure (excerpts, source_quality, etc.)
Uses SPARTA contracts for DuckDB validation queries
Integrates with task-monitor for enforced quality gates

Quick Start

cd .pi/skills/batch-quality

# Preflight: Test 3 samples through actual LLM
uv run python cli.py preflight \
    --stage 05 \
    --run-id run-recovery-verify \
    --samples 3

# If preflight passes, run your batch
# ...batch operation...

# Validate: Check DuckDB against contract
uv run python cli.py validate \
    --stage 05 \
    --run-id run-recovery-verify \
    --task-name "sparta-stage-05"

Commands

preflight

Test N samples through actual LLM before running full batch.

uv run python cli.py preflight \
    --stage <stage-name> \
    --run-id <sparta-run-id> \
    --samples 3 \
    --prompt <optional-prompt-file>

What it actually does:

Loads SPARTA contract for the stage (if exists)
Checks environment variables (CHUTES_API_KEY, CHUTES_TEXT_MODEL)
Connects to DuckDB for the run
Samples N items from the input queue
Runs each sample through scillm (actual LLM call)
Validates JSON response structure
Requires 50%+ samples to pass

Exit codes:

0: PASSED - safe to proceed
1: FAILED - fix issues first

validate

Validate batch output using SPARTA contracts.

uv run python cli.py validate \
    --stage <stage-name> \
    --run-id <sparta-run-id> \
    --task-name <task-monitor-name>

What it actually does:

Loads SPARTA contract (e.g.,
```
05_extract_knowledge.json
```
)
Runs all
```
validation_queries
```
from contract against DuckDB
Checks each query result against
```
expected_min
```
Notifies task-monitor of pass/fail

Contract example (

05_extract_knowledge.json

{
  "validation_queries": [
    {"name": "url_knowledge_count", "query": "SELECT COUNT(*) FROM url_knowledge", "expected_min": 10},
    {"name": "urls_processed", "query": "SELECT COUNT(*) FROM url_extraction_log WHERE ok = true", "expected_min": 5}
  ]
}

status

Check current preflight status (JSON output).

uv run python cli.py status

clear

Clear preflight state (requires new preflight).

uv run python cli.py clear

SPARTA Pipeline Integration

# 1. Register task with validation requirement
uv run python .pi/skills/task-monitor/monitor.py register \
    --name "sparta-stage-05" \
    --require-validation

# 2. Run preflight (ACTUALLY tests LLM)
uv run python .pi/skills/batch-quality/cli.py preflight \
    --stage 05 \
    --run-id run-recovery-verify \
    --samples 3

# 3. Run batch (only if preflight passed)
uv run python -m sparta.pipeline_duckdb.05_extract_knowledge \
    --run-id run-recovery-verify

# 4. Validate using contract queries
uv run python .pi/skills/batch-quality/cli.py validate \
    --stage 05 \
    --run-id run-recovery-verify \
    --task-name "sparta-stage-05"

Configuration

Environment variables:

SPARTA_ROOT

: Path to SPARTA project (default:

/home/graham/workspace/experiments/sparta

)

```
CHUTES_API_KEY
```
: API key for LLM calls
```
CHUTES_API_BASE
```
: API base URL (default:
```
https://llm.chutes.ai/v1
```
)
```
CHUTES_TEXT_MODEL
```
: Model ID for text extraction

Contract location:

$SPARTA_ROOT/tools/pipeline_gates/fixtures/D3-FEV/contracts/

Dependencies

```
typer
```
- CLI framework
```
duckdb
```
- Database queries
```
scillm
```
- LLM batch processing (for actual sample testing)

Key Principle

Preflight is cheap. Failed batches are expensive.

Testing 3 samples costs ~$0.01 and takes 30 seconds. Running 1000 items with a broken prompt costs ~$3 and takes hours.