Skills genotoxic

Graph-informed mutation testing triage. Parses codebases with Trailmark, runs mutation testing and necessist, then uses survived mutants, unnecessary test statements, and call graph data to identify false positives, missing test coverage, and fuzzing targets. Use when triaging survived mutants, analyzing mutation testing results, identifying test gaps, finding fuzzing targets from weak tests, running mutation frameworks (including circomvent and cairo-mutants), or using necessist.

install
source · Clone the upstream repo
git clone https://github.com/trailofbits/skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/trailofbits/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/trailmark/skills/genotoxic" ~/.claude/skills/trailofbits-skills-genotoxic && rm -rf "$T"
manifest: plugins/trailmark/skills/genotoxic/SKILL.md
source content

Genotoxic

Combines mutation testing and necessist (test statement removal) with code graph analysis to triage findings into actionable categories: false positives, missing unit tests, and fuzzing targets.

When to Use

  • After mutation testing reveals survived mutants that need triage
  • Identifying where unit tests would have the highest impact
  • Finding functions that need fuzz harnesses instead of unit tests
  • Prioritizing test improvements using data flow context
  • Filtering out harmless mutants from actionable ones
  • Finding unnecessary test statements that indicate weak assertions (necessist)

When NOT to Use

  • Codebase has no existing test suite (write tests first)
  • Pure documentation or configuration changes
  • Single-file scripts with trivial logic

Prerequisites

  • trailmark installed — if
    uv run trailmark
    fails, run:
    uv pip install trailmark
    
    DO NOT fall back to "manual verification" or "manual analysis" as a substitute for running trailmark. Install it first. If installation fails, report the error instead of switching to manual analysis.
  • A mutation testing framework for the target language — if the framework command fails (not found, not installed), install it using the instructions in references/mutation-frameworks.md. DO NOT fall back to "manual mutation analysis" or skip mutation testing. Install the framework first. If installation fails, report the error instead of switching to manual mutation analysis.
  • necessist (optional, recommended) — if the target language is supported (Go, Rust, Solidity/Foundry, TypeScript/Hardhat, TypeScript/Vitest, Rust/Anchor), install with
    cargo install necessist
    . See references/mutation-frameworks.md for details.
  • An existing test suite that passes
  • macOS environment: Run
    ulimit -n 1024
    before any
    mull-runner
    invocation. macOS Tahoe (26+) sets unlimited file descriptors by default, which crashes Mull's subprocess spawning. See references/mutation-frameworks.md for details.

Rationalizations to Reject

RationalizationWhy It's WrongRequired Action
"All survived mutants need tests"Many are harmless or equivalentTriage before writing tests
"Mutation testing is too noisy"Noise means you're not triagingUse graph data to filter
"Unit tests cover everything"Complex data flows need fuzzingCheck entrypoint reachability
"Dead code mutants don't matter"Dead code should be removedFlag for cleanup
"Low complexity = low risk"Boundary bugs hide in simple codeCheck mutant location
"Tool isn't installed, I'll do it manually"Manual analysis misses what tooling catchesInstall the tool first
"Necessist isn't mutation testing, skip it"Necessist finds what mutation testing misses: weak testsRun both when the language supports it

Quick Start

# 1. Build the code graph
uv run trailmark analyze --summary {targetDir}

# 2. Run mutation testing (language-dependent)
# Python:
uv run mutmut run --paths-to-mutate {targetDir}/src
uv run mutmut results

# 2b. Run necessist (if language supported)
necessist

# 3. Analyze results with this skill's workflow (Phase 3)

Workflow Overview

Phase 1: Graph Build      → Parse codebase with trailmark
      ↓
Phase 2: Mutation Run     → Execute mutation testing framework
Phase 2b: Necessist Run   → Remove test statements (optional, parallel)
      ↓
Phase 3: Triage           → Classify findings using graph data
      ↓
Output: Categorized Report
  ├── Corroborated         (both tools flag same function — highest value)
  ├── False Positives      (harmless, skip)
  ├── Missing Tests        (write unit tests)
  └── Fuzzing Targets      (set up fuzz harnesses)

Decision Tree

├─ Need to set up mutation testing for a language?
│  └─ Read: references/mutation-frameworks.md
│
├─ Need to set up necessist or find weak test statements?
│  └─ Read: references/mutation-frameworks.md (Necessist section)
│
├─ Need to understand the triage criteria in depth?
│  └─ Read: references/triage-methodology.md
│
├─ Need to understand how graph data informs triage?
│  └─ Read: references/graph-analysis.md
│
└─ Already have results + graph? Use Phase 3 below.

Phase 1: Build Code Graph and Run Pre-Analysis

Parse the target codebase with trailmark and run pre-analysis before mutation testing. Pre-analysis computes blast radius, entry points, privilege boundaries, and taint propagation, which Phase 3 uses for triage.

uv run trailmark analyze --summary {targetDir}

Use the

QueryEngine
API to build the graph and run pre-analysis:

  1. QueryEngine.from_directory("{targetDir}", language="{lang}")
  2. Call
    engine.preanalysis()
    mandatory before triage
  3. Export with
    engine.to_json()
    for cross-referencing with mutation results

See references/graph-analysis.md for the full API: node mapping, reachability queries, blast radius, and pre-analysis subgraph lookups.


Phase 2: Run Mutation Testing

Select and run the appropriate framework. See references/mutation-frameworks.md for language-specific setup.

Capture survived mutants. Each framework reports differently, but extract these fields per mutant:

FieldDescription
File pathSource file containing the mutant
Line numberLine where mutation was applied
Mutation typeWhat was changed (operator, value, etc.)
Statussurvived, killed, timeout, error

Filter to survived mutants only for Phase 3.


Phase 2b: Run Necessist (Optional)

If the target language is supported (Go, Rust, Solidity/Foundry, TypeScript/Hardhat, TypeScript/Vitest, Rust/Anchor), run necessist to find unnecessary test statements. This runs independently of Phase 2 and can execute in parallel.

# Auto-detect framework
necessist

# Or target specific test files
necessist tests/test_parser.rs

# Export results
necessist --dump

Filter to findings where the test passed after removal. See references/mutation-frameworks.md for framework-specific configuration and the normalized record format.

Map each removal to a production function using the algorithm in references/graph-analysis.md.


Phase 3: Triage Findings

For each survived mutant and each necessist removal, determine its triage bucket using graph data. Necessist removals must first be mapped to a production function (see references/graph-analysis.md).

Quick Classification (Mutation Testing)

SignalBucketReasoning
No callers in graphFalse PositiveDead code, mutant is unreachable
Only test callersFalse PositiveTest infrastructure, not production
Logging/display stringFalse PositiveCosmetic, no behavioral impact
Equivalent mutantFalse PositiveBehavior unchanged despite mutation
Simple function, low CC, no entrypoint pathMissing TestsUnit test is straightforward
Error handling pathMissing TestsShould have negative test cases
Boundary condition (off-by-one)Missing TestsProperty-based test candidate
Pure function, deterministicMissing TestsEasy to test, high value
High CC (>10), entrypoint reachableFuzzing TargetComplex + exposed = fuzz it
Parser/validator/deserializerFuzzing TargetStructured input handling
Many callers (>10) + moderate CCFuzzing TargetHigh blast radius
Binary/wire protocol handlingFuzzing TargetFuzzers excel at format testing

Quick Classification (Necessist)

SignalBucketReasoning
Redundant setup or debug callFalse PositiveStatement genuinely unnecessary
Cannot map to production functionFalse PositiveNo graph context for triage
Call removed, no assertion checks its effectMissing TestsTest has weak assertions
Assertion removed, test still passesMissing TestsRedundant or insufficient coverage
Maps to high-CC entrypoint-reachable functionFuzzing TargetComplex + exposed + weak test

When both mutation testing and necessist flag the same production function, mark as corroborated — highest confidence finding.

For detailed criteria, see references/triage-methodology.md.

Graph Queries for Triage

For each mutant, map it to its containing graph node and use pre-analysis subgraphs (tainted, high_blast_radius, privilege_boundary) from Phase 1 to classify it. The classification logic checks: no callers → false positive, privilege boundary → fuzzing, high CC + tainted → fuzzing, high blast radius → fuzzing, otherwise → missing tests.

See references/graph-analysis.md for the

batch_triage
implementation and node mapping functions.


Output Format

Generate a markdown report:

# Genotoxic Triage Report

## Summary
- Total survived mutants: N
- Total necessist removals: N
- Corroborated findings: N
- False positives: N (N%)
- Missing test coverage: N (N%)
- Fuzzing targets: N (N%)

## Corroborated Findings
| File | Line | Function | Mutation Signal | Necessist Signal | Action |
|------|------|----------|----------------|------------------|--------|

## False Positives
| File | Line | Mutation | Reason | Source |
|------|------|----------|--------|--------|

## Missing Test Coverage
| File | Line | Function | CC | Callers | Suggested Test | Source |
|------|------|----------|----|---------|----------------|--------|

## Fuzzing Targets
| File | Line | Function | CC | Entrypoint Path | Blast Radius | Source |
|------|------|----------|----|-----------------|--------------|--------|

The

Source
column is
mutation
,
necessist
, or
corroborated
.

Write the report to

GENOTOXIC_REPORT.md
in the working directory.


Quality Checklist

Before delivering:

  • Trailmark graph built for target language
  • Mutation framework ran to completion
  • Necessist ran (if language supported) or noted as not applicable
  • All survived mutants triaged (none unclassified)
  • All necessist removals triaged (if applicable)
  • Corroborated findings identified (if both tools ran)
  • False positives have clear justifications
  • Missing test items include suggested test type
  • Fuzzing targets include entrypoint paths and blast radius
  • Report file written to
    GENOTOXIC_REPORT.md
  • User notified with summary statistics

Integration

trailmark skill:

  • Phase 1: Build code graph, query complexity and entrypoints
  • Phase 3: Caller analysis, reachability, blast radius

property-based-testing skill:

  • Missing test coverage items involving boundary conditions
  • Roundtrip/idempotence properties for serialization mutants

testing-handbook-skills (fuzzing):

  • Fuzzing target items: use
    harness-writing
    ,
    cargo-fuzz
    ,
    atheris

Supporting Documentation


First-time users: Start with Phase 1 (graph build), then run mutations, then use the Quick Classification table in Phase 3.

Experienced users: Jump to Phase 3 and use the Decision Tree to load specific reference material.