Bernstein bernstein-quality

install
source · Clone the upstream repo
git clone https://github.com/chernistry/bernstein
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/chernistry/bernstein "$T" && mkdir -p ~/.claude/skills && cp -r "$T/packages/cursor-plugin/skills/bernstein-quality" ~/.claude/skills/chernistry-bernstein-bernstein-quality && rm -rf "$T"
manifest: packages/cursor-plugin/skills/bernstein-quality/SKILL.md
source content

Bernstein Quality Metrics

Analyze quality and reliability of agent-generated code.

When to Use

  • User asks "how reliable are the agents?" or "which model is best?"
  • User wants success rates, pass rates, or completion time stats
  • User asks about test failures or lint issues across models
  • User says "show me quality metrics"

Instructions

  1. Run

    scripts/quality.sh metrics
    for overall quality metrics.

  2. Run

    scripts/quality.sh pass-rates
    for lint/typecheck/test pass rates by model.

  3. Run

    scripts/quality.sh times
    for completion time distributions.

  4. Present a quality dashboard:

## Quality Dashboard

### Success Rate by Model
| Model | Tasks | Success | Fail | Rate |
|-------|-------|---------|------|------|
| claude-sonnet-4 | 24 | 22 | 2 | 91.7% |
| gpt-4.1 | 12 | 10 | 2 | 83.3% |

### Pass Rates
| Check | Overall | claude-sonnet-4 | gpt-4.1 |
|-------|---------|-----------------|---------|
| Lint | 96% | 98% | 92% |
| Type-check | 88% | 91% | 83% |
| Tests | 85% | 89% | 75% |

### Completion Times
| Percentile | Time |
|------------|------|
| p50 | 3m 20s |
| p90 | 8m 45s |
| p99 | 15m 12s |
  1. Highlight any models with significantly lower pass rates.
  2. Recommend model routing adjustments if one model consistently underperforms.