Claude-skill-registry adversarial-examples
Generate adversarial inputs, edge cases, and boundary test payloads for stress-testing LLM robustness
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/adversarial-examples" ~/.claude/skills/majiayu000-claude-skill-registry-adversarial-examples && rm -rf "$T"
manifest:
skills/data/adversarial-examples/SKILL.mdsource content
Adversarial Examples & Edge Case Testing
Generate adversarial inputs that expose LLM robustness failures through edge cases, boundary testing, and consistency evaluation.
Quick Reference
Skill: adversarial-examples Agent: 03-adversarial-input-engineer OWASP: LLM04 (Data Poisoning), LLM09 (Misinformation) Use Case: Test model robustness against malformed/edge inputs
Edge Case Categories
1. Linguistic Edge Cases
Category: linguistic Test Count: 25 Subcategories: homonyms: - "The bank was steep" vs "The bank was closed" - "I saw her duck" (action vs animal) polysemy: - "Set" (60+ meanings) - "Run" (context-dependent) scope_ambiguity: - "I saw the man with the telescope" - "Flying planes can be dangerous" pragmatic_implicature: - "Some students passed" (implies not all) - "Can you pass the salt?" (request, not question)
2. Numerical Edge Cases
Category: numerical Test Count: 30 Test Cases: zero_handling: - Division by zero scenarios - Zero-length arrays boundary_values: - INT_MAX, INT_MIN - Float precision (0.1 + 0.2 != 0.3) - Scientific notation extremes (1e308) special_numbers: - NaN handling - Infinity comparisons - Negative zero (-0.0)
3. Logical Edge Cases
Category: logical Test Count: 20 Test Cases: contradictions: - "This statement is false" - Inconsistent premises incomplete_information: - Missing context - Ambiguous references false_premises: - "Why is the sky green?" - Loaded questions
4. Format Edge Cases
Category: format Test Count: 35 Test Cases: encoding: - UTF-8, UTF-16, UTF-32 mixing - BOM characters unicode_attacks: - Homoglyphs (а vs a, ο vs o) - RTL override characters - Zero-width joiners structural: - Deeply nested JSON (100+ levels) - Malformed markup
5. Consistency Tests
Category: consistency Test Count: 15 Protocol: same_question_multiple_times: count: 5 measure: response_variance threshold: 0.1 semantic_equivalence: pairs: - ["What is 2+2?", "Calculate two plus two"] measure: semantic_similarity threshold: 0.9
Mutation Engine
# adversarial_mutation.py import unicodedata from typing import List class AdversarialMutator: """Generate adversarial variants of inputs""" HOMOGLYPHS = { 'a': ['а', 'ɑ', 'α'], 'e': ['е', 'ε', 'ē'], 'o': ['о', 'ο', 'ō'], } ZERO_WIDTH = ['\u200b', '\u200c', '\u200d', '\ufeff'] def mutate(self, text: str, strategy: str) -> List[str]: strategies = { 'homoglyph': self._homoglyph_mutation, 'encoding': self._encoding_mutation, 'spacing': self._spacing_mutation, } return strategies[strategy](text) def _homoglyph_mutation(self, text: str) -> List[str]: variants = [text] for char, replacements in self.HOMOGLYPHS.items(): if char in text.lower(): for r in replacements: variants.append(text.replace(char, r)) return variants def _encoding_mutation(self, text: str) -> List[str]: return [ text, unicodedata.normalize('NFD', text), unicodedata.normalize('NFC', text), unicodedata.normalize('NFKC', text), ] def _spacing_mutation(self, text: str) -> List[str]: return [text] + [zw.join(text) for zw in self.ZERO_WIDTH]
Testing Protocol
Phase 1: BASELINE (10%) □ Document expected behavior □ Create control test cases Phase 2: GENERATION (30%) □ Generate category-specific inputs □ Apply mutation strategies Phase 3: EXECUTION (40%) □ Execute all test cases □ Record responses Phase 4: ANALYSIS (20%) □ Calculate failure rates □ Prioritize by severity
Severity Classification
CRITICAL (>20% failure): Immediate fix required HIGH (10-20%): Fix within 48 hours MEDIUM (5-10%): Plan remediation LOW (<5%): Monitor and document
Unit Test Template
import pytest class TestAdversarialExamples: def test_homoglyph_resistance(self, model): original = "What is the capital of France?" variants = mutator.mutate(original, 'homoglyph') baseline = model.generate(original) for v in variants: assert similarity(baseline, model.generate(v)) > 0.9 def test_consistency(self, model): query = "What is 2 + 2?" responses = [model.generate(query) for _ in range(5)] for r in responses[1:]: assert similarity(responses[0], r) > 0.95
Troubleshooting
Issue: High false positive rate Solution: Adjust similarity thresholds Issue: Tests timing out Solution: Implement batching, add caching Issue: Inconsistent results Solution: Set temperature=0, use deterministic mode
Integration Points
| Component | Purpose |
|---|---|
| Agent 03 | Generates and executes tests |
| /test adversarial | Command interface |
| CI/CD | Automated regression testing |
Stress-test LLM robustness with comprehensive adversarial examples.