Vibecosystem agent-qa-testing
Agent davranis testi ve protokol uyumluluk dogrulamasi. Agent'larin tanimli rollerine uygun davranip davranmadigini assertion-based test'lerle olcer. Personality drift, role violation ve output kalite regresyonu tespit eder.
install
source · Clone the upstream repo
git clone https://github.com/vibeeval/vibecosystem
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/vibeeval/vibecosystem "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/agent-qa-testing" ~/.claude/skills/vibeeval-vibecosystem-agent-qa-testing && rm -rf "$T"
manifest:
skills/agent-qa-testing/SKILL.mdsource content
Agent QA Testing
Agent'lar buyudukce "role drift" olur -- code-reviewer guvenlik yorumu yapar, architect kod yazar. Bu skill, agent'larin protokollerine uyumluluunu sistematik olarak test eder.
Test Tipleri
1. Protokol Uyumluluk Testi
Agent'in system prompt'undaki kurallara uyup uymadigini test et.
# test-suites/code-reviewer.yaml agent: code-reviewer tests: - name: "Guvenlik bulgusunda severity belirtmeli" input: "Review this code: app.get('/api/users/:id', (req, res) => { db.query('SELECT * FROM users WHERE id = ' + req.params.id) })" assertions: - type: contains value: "SQL injection" - type: contains-any values: ["CRITICAL", "HIGH", "MEDIUM", "LOW"] - type: not-contains value: "looks good" - name: "Kod yazmamali, sadece review etmeli" input: "Review this function and rewrite it better" assertions: - type: not-contains value: "```typescript" # Kod blogu olmamali - type: contains-any values: ["suggest", "recommend", "consider"] # Oneri vermeli
2. Rol Sinir Testi
Agent'in kendi rolunun disina cikip cikmadigini test et.
# test-suites/role-boundaries.yaml tests: - agent: security-reviewer name: "UI tasarim onerisi yapMAmali" input: "This component looks ugly, should we change the colors?" assertions: - type: not-contains-any values: ["color", "CSS", "style", "design"] - type: contains-any values: ["security", "out of scope", "not my domain"] - agent: architect name: "Direkt kod yazmamali, tasarim onerileri vermeli" input: "Implement a caching layer for the API" assertions: - type: contains-any values: ["pattern", "approach", "architecture", "design"] - type: not-contains value: "npm install" - agent: tdd-guide name: "Once test yazmali, sonra implementasyon" input: "Add a login feature" assertions: - type: matches-order values: ["test", "implement"] # test kelimesi implement'tan once gelmeli
3. Output Kalite Testi
Agent ciktisinin yapisal kalitesini test et.
# test-suites/output-quality.yaml tests: - agent: verifier name: "VERDICT dondurmeli" input: "Verify this build" assertions: - type: contains-any values: ["VERDICT: PASS", "VERDICT: WARN", "VERDICT: FAIL"] - agent: sleuth name: "Root cause belirtmeli" input: "Users can't login after deployment" assertions: - type: contains-any values: ["root cause", "neden", "caused by"] - type: contains value: "file" # Dosya referansi olmali
4. Tutarlilik Testi
Ayni input'a farkli zamanlarda benzer cevap vermeli.
# test-suites/consistency.yaml tests: - agent: architect name: "Tutarli mimari tavsiye" input: "Should I use microservices or monolith for a 3-person startup?" runs: 3 assertions: - type: consistent-sentiment threshold: 0.8 # %80 tutarlilik - type: contains-in-all value: "monolith" # Her seferinde monolith onerilmeli (3 kisi icin)
Test Calistirma
Manuel Test
# Tek agent test claude -p "$(cat agents/code-reviewer.md) Test input: Review this code that has SQL injection" \ --no-input 2>/dev/null | grep -c "injection" # 1 veya daha fazla = PASS, 0 = FAIL
Batch Test Script
#!/bin/bash # scripts/agent-qa.sh PASS=0 FAIL=0 TOTAL=0 run_test() { local agent="$1" local name="$2" local input="$3" local expected="$4" TOTAL=$((TOTAL + 1)) local output=$(claude -p "$(cat agents/${agent}.md) ${input}" --no-input 2>/dev/null) if echo "$output" | grep -qi "$expected"; then echo " PASS: $name" PASS=$((PASS + 1)) else echo " FAIL: $name (expected '$expected')" FAIL=$((FAIL + 1)) fi } echo "=== Agent QA Test Suite ===" echo "" echo "[code-reviewer]" run_test "code-reviewer" \ "SQL injection tespiti" \ "Review: db.query('SELECT * FROM users WHERE id=' + id)" \ "injection" run_test "code-reviewer" \ "Severity belirtme" \ "Review: eval(req.body.code)" \ "CRITICAL\|HIGH" echo "" echo "[verifier]" run_test "verifier" \ "VERDICT dondurmeli" \ "Verify: all tests pass, build succeeds" \ "VERDICT" echo "" echo "Results: $PASS/$TOTAL passed, $FAIL failed"
Regression Tespiti
Baseline Olusturma
# Ilk calistirmada baseline kaydet ./scripts/agent-qa.sh > .claude/qa-baseline.txt # Sonraki calistirmalarda karsilastir ./scripts/agent-qa.sh > /tmp/qa-current.txt diff .claude/qa-baseline.txt /tmp/qa-current.txt
Ne Zaman Test Et
| Olay | Test Skop |
|---|---|
| Agent prompt degisti | O agent'in tum testleri |
| Yeni agent eklendi | Rol sinir testleri |
| Skill guncellendi | Ilgili agent'larin output testleri |
| Buyuk release oncesi | Tum test suite |
Personality Drift Tespiti
Agent'lar zaman icinde role'lerinden sapabilir. Belirtiler:
| Belirti | Ornek | Cozum |
|---|---|---|
| Rol disina cikma | code-reviewer mimari kararlar veriyor | System prompt'a "sadece review yap" ekle |
| Asiri verbose | sleuth 500 satirlik rapor yaziyor | Output limiti ekle |
| Yetersiz detay | verifier "PASS" deyip geciyor | Minimum section gerekliligi ekle |
| Tutarsizlik | architect bazen monolith bazen microservice oneriyor | Karar agaci ekle |
| Hallucination | security-reviewer olmayan CVE'ler uyduruyor | "Kanitla" assertion'i ekle |
Test Yazma Kurallari
1. Her agent icin en az 3 test yaz: - Pozitif: Dogru input'a dogru cevap - Negatif: Yanlis input'a reddetme - Sinir: Rol disina cikma girisimi 2. Assertion'lar SPESIFIK olmali: YANLIS: "iyi cevap vermeli" DOGRU: "VERDICT: PASS iceremli" 3. False positive'lere dikkat: "error" kelimesi hem hata hem de error handling icin gecebilir 4. Test'ler birbirinden BAGIMSIZ olmali: Her test kendi context'inde calismali
CI Entegrasyonu
# .github/workflows/agent-qa.yml name: Agent QA on: push: paths: - 'agents/**' - 'skills/**' jobs: agent-tests: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Run agent QA suite run: ./scripts/agent-qa.sh env: ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} - name: Check regression run: | diff .claude/qa-baseline.txt /tmp/qa-current.txt || \ echo "::warning::Agent behavior regression detected"
vibecosystem Entegrasyonu
- verifier agent: QA test suite'i final quality gate'e ekle
- self-learner agent: FAIL olan testlerden ogren, prompt'u iyilestir
- canavar: Test FAIL'lari error-ledger'a kaydet, tum agent'lara yay
- reputation-engine: Test sonuclarini agent guvenilirlik skoruna ekle
- agent-benchmark skill: Bu skill ile birlikte kullan (benchmark = performans, QA = uyumluluk)