Bernstein bernstein-quality
install
source · Clone the upstream repo
git clone https://github.com/chernistry/bernstein
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/chernistry/bernstein "$T" && mkdir -p ~/.claude/skills && cp -r "$T/packages/cursor-plugin/skills/bernstein-quality" ~/.claude/skills/chernistry-bernstein-bernstein-quality && rm -rf "$T"
manifest:
packages/cursor-plugin/skills/bernstein-quality/SKILL.mdsource content
Bernstein Quality Metrics
Analyze quality and reliability of agent-generated code.
When to Use
- User asks "how reliable are the agents?" or "which model is best?"
- User wants success rates, pass rates, or completion time stats
- User asks about test failures or lint issues across models
- User says "show me quality metrics"
Instructions
-
Run
for overall quality metrics.scripts/quality.sh metrics -
Run
for lint/typecheck/test pass rates by model.scripts/quality.sh pass-rates -
Run
for completion time distributions.scripts/quality.sh times -
Present a quality dashboard:
## Quality Dashboard ### Success Rate by Model | Model | Tasks | Success | Fail | Rate | |-------|-------|---------|------|------| | claude-sonnet-4 | 24 | 22 | 2 | 91.7% | | gpt-4.1 | 12 | 10 | 2 | 83.3% | ### Pass Rates | Check | Overall | claude-sonnet-4 | gpt-4.1 | |-------|---------|-----------------|---------| | Lint | 96% | 98% | 92% | | Type-check | 88% | 91% | 83% | | Tests | 85% | 89% | 75% | ### Completion Times | Percentile | Time | |------------|------| | p50 | 3m 20s | | p90 | 8m 45s | | p99 | 15m 12s |
- Highlight any models with significantly lower pass rates.
- Recommend model routing adjustments if one model consistently underperforms.