Skills-4-SE counterexample-generator

Generate concrete counterexamples when formal verification, assertions, or specifications fail. Use this skill when debugging failed proofs, understanding why verification fails, creating minimal reproducing examples, analyzing assertion violations, investigating invariant breaks, or diagnosing specification mismatches. Produces concrete input values, execution traces, and state information that demonstrate the failure.

install

source · Clone the upstream repo

git clone https://github.com/ArabelaTso/Skills-4-SE

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/ArabelaTso/Skills-4-SE "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/counterexample-generator" ~/.claude/skills/arabelatso-skills-4-se-counterexample-generator && rm -rf "$T"

manifest: skills/counterexample-generator/SKILL.md

source content

Counterexample Generator

Systematically generate counterexamples that demonstrate why verification fails. Provides concrete input values, execution traces, and diagnostic information to help understand and fix specification or implementation issues.

Core Capabilities

1. Precondition Violation Detection

Generate inputs that violate preconditions:

Invalid parameter values - Out-of-range inputs
Null/undefined references - Missing required objects
State violations - Invalid object states
Type mismatches - Incorrect types
Constraint violations - Broken business rules

2. Postcondition Failure Analysis

Find executions where postconditions fail:

Return value violations - Wrong return values
State inconsistencies - Incorrect final states
Invariant breaks - Invariants violated after execution
Side effect errors - Unexpected modifications
Exception failures - Wrong or missing exceptions

3. Invariant Violation Discovery

Identify cases where invariants break:

Class invariant violations - Object consistency broken
Loop invariant violations - Invariant not maintained
Data structure violations - Consistency broken
Temporal violations - Time-based properties fail
Concurrency violations - Race conditions exposed

4. Execution Trace Generation

Produce detailed execution paths:

Step-by-step traces - Line-by-line execution
State snapshots - Variable values at each step
Branch decisions - Which paths taken
Call stacks - Function call hierarchy
Symbolic execution - Constraint-based paths

Counterexample Generation Workflow

Step 1: Identify the Failure

Understand what verification failed:

From verification tools:

Verification failed: Postcondition violated
Function: withdraw
Line: 15
Property: account.balance == old(account.balance) - amount

From assertions:

AssertionError: assert result >= 0
  in function: sqrt
  with input: x = -1

From test failures:

Test failed: test_binary_search
Expected: 2
Actual: -1

Analysis questions:

What property failed?
Where did it fail?
What was being verified?
What are the inputs/state?

Step 2: Analyze the Specification

Understand what should be true:

Example specification:

def withdraw(account: Account, amount: float) -> float:
    """
    Preconditions:
        - account is not None
        - amount > 0
        - account.balance >= amount

    Postconditions:
        - account.balance == old(account.balance) - amount
        - result == account.balance
        - account.balance >= 0
    """

Identify constraints:

Precondition:
```
account.balance >= amount
```

Postcondition:

account.balance == old(account.balance) - amount

Invariant:
```
account.balance >= 0
```

Step 3: Generate Counterexample Inputs

Find concrete values that cause failure:

Manual generation:

# Try to violate postcondition
# If amount = 100, old balance = 50
# Then: 50 - 100 = -50 (violates balance >= 0 invariant!)

counterexample = {
    'account': Account(balance=50),
    'amount': 100
}

Systematic generation:

Start with boundary values
Try edge cases
Use constraint solving
Employ random testing
Apply mutation testing

Step 4: Execute and Trace

Run the counterexample and capture details:

Execution trace:

Initial state:
  account.balance = 50
  amount = 100

Step 1: Enter withdraw function
  Precondition check: account.balance >= amount?
  50 >= 100 → False

  Should raise error, but code doesn't check!

Step 2: Execute: account.balance -= amount
  account.balance = 50 - 100 = -50

Step 3: Return account.balance
  result = -50

Postcondition check:
  account.balance == old(account.balance) - amount?
  -50 == 50 - 100 → True (passes)

  account.balance >= 0?
  -50 >= 0 → False (FAILS!)

COUNTEREXAMPLE FOUND:
  Invariant violation: balance became negative

Step 5: Present the Counterexample

Format for clarity and debugging:

Counterexample report:

## Counterexample: Invariant Violation in withdraw()

### Failed Property
Invariant: account.balance >= 0

### Inputs
- account.balance = 50
- amount = 100

### Execution Trace
1. Initial: account.balance = 50
2. Check: balance >= amount → 50 >= 100 → False
   (Missing precondition enforcement!)
3. Execute: balance -= amount → balance = -50
4. Return: -50

### Why It Fails
The function doesn't enforce the precondition that
`account.balance >= amount`. When amount > balance,
the withdrawal proceeds anyway, violating the
invariant that balance must be non-negative.

### Root Cause
Missing validation:
```python
if account.balance < amount:
    raise InsufficientFundsError()

Fix

Add precondition check before withdrawal.


## Counterexample Patterns

### Pattern 1: Boundary Value Violation

**Specification:**
```python
def sqrt(x: float) -> float:
    """
    Precondition: x >= 0
    Postcondition: result >= 0 and abs(result * result - x) < 1e-10
    """
    return x ** 0.5

Verification failure:

Postcondition violated when x < 0

Counterexample:

# Counterexample
input: x = -1

# Execution:
result = (-1) ** 0.5 = (1.5707963267948966e-308+6.123233995736766e-17j)
# Returns complex number, not float!

# Why it fails:
# Python returns complex for sqrt of negative
# Violates postcondition: result should be float

# Trace:
Initial state: x = -1
Step 1: Compute (-1) ** 0.5
Step 2: Result is complex number
Step 3: Return complex (type violation)

# Fix needed:
if x < 0:
    raise ValueError("Cannot compute sqrt of negative number")

Pattern 2: Off-By-One Error

Specification:

def binary_search(arr: List[int], target: int) -> int:
    """
    Precondition: arr is sorted
    Postcondition:
        If result >= 0: arr[result] == target
        If result == -1: target not in arr
    """
    left, right = 0, len(arr) - 1
    while left <= right:
        mid = (left + right) // 2
        if arr[mid] == target:
            return mid
        elif arr[mid] < target:
            left = mid  # BUG: should be mid + 1
        else:
            right = mid - 1
    return -1

Counterexample:

# Counterexample
input:
  arr = [1, 2, 3, 4, 5]
  target = 5

# Execution trace:
Iteration 1:
  left = 0, right = 4, mid = 2
  arr[2] = 3 < 5
  left = 2  # Bug! Infinite loop starts

Iteration 2:
  left = 2, right = 4, mid = 3
  arr[3] = 4 < 5
  left = 3

Iteration 3:
  left = 3, right = 4, mid = 3
  arr[3] = 4 < 5
  left = 3  # Stuck! left doesn't advance

... (infinite loop)

# Why it fails:
Setting left = mid instead of left = mid + 1
causes infinite loop when target is in right half

# Minimal counterexample:
arr = [1, 2], target = 2
→ Infinite loop

Pattern 3: Invariant Violation

Specification:

class Stack:
    """
    Invariant:
        - 0 <= size <= capacity
        - size == len(items)
    """
    def __init__(self, capacity):
        self.capacity = capacity
        self.items = []
        self.size = 0

    def push(self, item):
        self.items.append(item)
        self.size += 1  # BUG: doesn't check capacity

Counterexample:

# Counterexample
Initial state:
  capacity = 2
  items = []
  size = 0

Operations:
  push(1) → items = [1], size = 1 ✓
  push(2) → items = [1, 2], size = 2 ✓
  push(3) → items = [1, 2, 3], size = 3 ✗

# Invariant violation:
size > capacity
3 > 2 → FAILS

# Why it fails:
No check that size < capacity before push

# Fix:
def push(self, item):
    if self.size >= self.capacity:
        raise StackOverflowError()
    self.items.append(item)
    self.size += 1

Pattern 4: State Inconsistency

Specification:

class BankAccount:
    """
    Invariant: balance >= 0
    """
    def __init__(self, balance):
        self.balance = balance

    def transfer_to(self, other, amount):
        """
        Postcondition:
            self.balance == old(self.balance) - amount
            other.balance == old(other.balance) + amount
        """
        self.balance -= amount
        # BUG: Missing other.balance += amount

Counterexample:

# Counterexample
Initial state:
  account1.balance = 100
  account2.balance = 50

Operation:
  account1.transfer_to(account2, 30)

# Execution trace:
Step 1: account1.balance -= 30
  account1.balance = 70

Step 2: [Missing code!]
  account2.balance unchanged = 50

Final state:
  account1.balance = 70
  account2.balance = 50

# Postcondition check:
account2.balance == old(account2.balance) + amount?
50 == 50 + 30?
50 == 80? → FALSE

# Money disappeared:
old total = 100 + 50 = 150
new total = 70 + 50 = 120
Lost: 30

# Counterexample demonstrates:
Money is lost, violates conservation invariant

Pattern 5: Race Condition

Specification:

class Counter:
    """
    Thread-safe counter
    Invariant: value equals number of increments
    """
    def __init__(self):
        self.value = 0

    def increment(self):
        # BUG: Not atomic
        temp = self.value
        temp += 1
        self.value = temp

Counterexample:

# Counterexample (concurrent execution)
Initial state:
  value = 0

Thread 1:                    Thread 2:
  temp1 = value (0)
                              temp2 = value (0)
  temp1 += 1 (1)
                              temp2 += 1 (1)
  value = temp1 (1)
                              value = temp2 (1)

Final state:
  value = 1

# Expected:
2 increments → value should be 2

# Actual:
value = 1

# Why it fails:
Non-atomic read-modify-write
Lost update: Thread 2 overwrites Thread 1

# Interleaving trace:
T1: Read value=0
T2: Read value=0
T1: Compute 0+1=1
T2: Compute 0+1=1
T1: Write value=1
T2: Write value=1 (overwrites!)
Result: value=1 (should be 2)

Pattern 6: Loop Invariant Violation

Specification:

def find_max(arr: List[int]) -> int:
    """
    Loop invariant:
        max_val is the maximum of arr[0:i]
    Postcondition:
        result is maximum of all elements
    """
    max_val = arr[0]
    for i in range(len(arr)):  # BUG: should be range(1, len(arr))
        if arr[i] > max_val:
            max_val = arr[i]
    return max_val

Counterexample:

# Counterexample
input: arr = [5, 3, 8, 2]

# Execution trace:
Initial: max_val = 5, i = 0

Iteration 1 (i=0):
  arr[0] > max_val?
  5 > 5? → False
  max_val remains 5

Iteration 2 (i=1):
  arr[1] > max_val?
  3 > 5? → False
  max_val remains 5

Iteration 3 (i=2):
  arr[2] > max_val?
  8 > 5? → True
  max_val = 8

Iteration 4 (i=3):
  arr[3] > max_val?
  2 > 8? → False
  max_val remains 8

Result: 8 ✓ (correct, but inefficient)

# Better counterexample showing actual bug:
input: arr = [1, 2, 3, 5, 4]

If we compare with itself at i=0:
  Loop invariant violated on first iteration
  Compares element with itself unnecessarily

# Real bug revealed with:
input: arr = []  # Empty array

Execution:
  max_val = arr[0] → IndexError!

# Counterexample demonstrates:
Missing check for empty array

Pattern 7: Arithmetic Overflow

Specification:

def midpoint(left: int, right: int) -> int:
    """
    Postcondition: left <= result <= right
    """
    return (left + right) // 2  # BUG: Overflow possible

Counterexample:

# Counterexample (in languages with fixed-size integers)
input:
  left = 2_000_000_000
  right = 2_000_000_000

# Execution:
Step 1: left + right = 4_000_000_000
  (Overflow in 32-bit signed integer! Max = 2_147_483_647)
  Wraps to negative: -294_967_296

Step 2: -294_967_296 // 2 = -147_483_648

Result: -147_483_648

# Postcondition check:
left <= result <= right?
2_000_000_000 <= -147_483_648 <= 2_000_000_000?
FALSE

# Why it fails:
Integer overflow in addition

# Fix:
return left + (right - left) // 2

Counterexample Generation Techniques

Technique 1: Boundary Value Analysis

Test at boundaries of valid ranges:

# For specification: 0 <= x <= 100
Test values:
  - Minimum: 0
  - Just above minimum: 1
  - Just below maximum: 99
  - Maximum: 100
  - Below minimum: -1 (counterexample)
  - Above maximum: 101 (counterexample)

Technique 2: Constraint Solving

Use SMT solvers to find satisfying inputs:

# Specification:
# Precondition: x > 0 and y > 0
# Postcondition: result > x and result > y

# Find counterexample where postcondition fails:
# We want: NOT (result > x and result > y)
# Which is: result <= x OR result <= y

# SMT constraint:
# x > 0 AND y > 0 AND (result <= x OR result <= y)

# Solver finds:
x = 10, y = 5
result = 7  # 7 <= 10, so postcondition fails

Technique 3: Symbolic Execution

Execute code symbolically to find violations:

def abs_diff(a, b):
    if a > b:
        return a - b
    else:
        return b - a

# Symbolic execution:
Path 1: a > b
  Constraint: a > b
  Return: a - b

Path 2: a <= b
  Constraint: a <= b
  Return: b - a

# Check postcondition: result >= 0
Path 1: a - b >= 0? (a > b, so yes)
Path 2: b - a >= 0? (a <= b, so yes)

# No counterexample found (correct!)

Technique 4: Mutation Testing

Mutate code and find tests that don't catch it:

# Original:
def is_even(n):
    return n % 2 == 0

# Mutant:
def is_even(n):
    return n % 2 == 1  # Mutation: == to ==

# Counterexample for specification:
input: n = 4
expected: True (even)
actual: False (mutant says odd)

# This shows missing test case

Counterexample Report Template

## Counterexample Report

### Failed Property
[Precondition/Postcondition/Invariant that failed]

### Location
File: [filename]
Function: [function name]
Line: [line number]

### Counterexample Inputs
[Concrete input values that trigger the failure]

### Expected Behavior
[What should happen according to specification]

### Actual Behavior
[What actually happened]

### Execution Trace
[Step-by-step execution showing the failure]

### Root Cause
[Why the failure occurs]

### Suggested Fix
[How to fix the issue]

### Minimal Example
[Simplest input that demonstrates the problem]

Example Complete Report

## Counterexample Report

### Failed Property
Postcondition: result >= 0

### Location
File: math_utils.py
Function: sqrt
Line: 12

### Counterexample Inputs
```python
x = -4

Expected Behavior

According to specification:

Should raise ValueError for x < 0
Should only return non-negative float values

Actual Behavior

result = sqrt(-4)
# Returns: (1.2246467991473532e-16+2j)
# Type: <class 'complex'>

Function returns a complex number instead of raising an exception.

Execution Trace

1. Function called with x = -4
2. Precondition check: x >= 0? → False (should fail here!)
3. No explicit check implemented
4. Compute: (-4) ** 0.5
5. Python evaluates to complex: 2j
6. Return complex number
7. Postcondition: result >= 0? → Type error (can't compare complex to int)

Root Cause

Missing precondition enforcement. The function assumes input validation but doesn't implement it. Python's ** operator returns complex numbers for negative bases with fractional exponents.

Suggested Fix

def sqrt(x: float) -> float:
    if x < 0:
        raise ValueError(f"Cannot compute sqrt of negative number: {x}")
    return x ** 0.5

Minimal Example

Simplest failing input:

sqrt(-1)  # Any negative number fails

Additional Test Cases

Based on this counterexample, add tests:

def test_sqrt_negative():
    with pytest.raises(ValueError):
        sqrt(-1)

def test_sqrt_zero():
    assert sqrt(0) == 0

def test_sqrt_positive():
    assert abs(sqrt(4) - 2.0) < 1e-10


## Best Practices

1. **Start simple** - Find minimal counterexamples first
2. **Be concrete** - Use specific values, not symbolic
3. **Show execution** - Provide step-by-step traces
4. **Explain clearly** - Make the failure obvious
5. **Suggest fixes** - Show how to resolve the issue
6. **Test the fix** - Verify the counterexample is resolved
7. **Generalize** - Identify similar potential failures
8. **Document** - Record counterexamples for regression testing
9. **Automate** - Use tools to generate counterexamples
10. **Verify minimal** - Ensure counterexample is simplest possible

## Tools and Techniques

### Static Analysis Tools
- **Dafny** - Auto-generates counterexamples for failed proofs
- **Frama-C** - C verification with counterexample generation
- **Z3/SMT solvers** - Constraint-based counterexample finding
- **CBMC** - Bounded model checking for C

### Dynamic Analysis
- **Property-based testing** - Hypothesis, QuickCheck
- **Fuzzing** - AFL, LibFuzzer
- **Concolic execution** - KLEE, Symbolic PathFinder
- **Mutation testing** - Mutmut, PIT

### Manual Techniques
- Boundary value analysis
- Equivalence partitioning
- State transition testing
- Path coverage analysis

## Common Counterexample Scenarios

### Scenario 1: Missing Validation
```python
# Bug: No input validation
def divide(a, b):
    return a / b

# Counterexample: b = 0 → ZeroDivisionError

Scenario 2: Wrong Boundary

# Bug: Off-by-one in condition
if x <= 100:  # Should be x < 100

# Counterexample: x = 100 (incorrectly included)

Scenario 3: Type Confusion

# Bug: Assumes integer
def double(x):
    return x * 2

# Counterexample: x = "hello" → "hellohello" (string, not number)

Scenario 4: Unhandled Case

# Bug: Missing else
if x > 0:
    return 1
elif x < 0:
    return -1
# Missing: x == 0

# Counterexample: x = 0 → None (should return 0)

Scenario 5: Resource Leak

# Bug: File not closed on exception
def read_file(path):
    f = open(path)
    data = f.read()
    # Missing: f.close()
    return data

# Counterexample: Exception during read → file not closed