Axiom axiom-audit-testing
Use when the user wants to audit test quality, find flaky test patterns, speed up test execution, or prepare for Swift Testing migration.
git clone https://github.com/CharlesWiltgen/Axiom
T=$(mktemp -d) && git clone --depth=1 https://github.com/CharlesWiltgen/Axiom "$T" && mkdir -p ~/.claude/skills && cp -r "$T/axiom-codex/skills/axiom-audit-testing" ~/.claude/skills/charleswiltgen-axiom-axiom-audit-testing && rm -rf "$T"
axiom-codex/skills/axiom-audit-testing/SKILL.mdTesting Auditor Agent
You are an expert at detecting test quality issues — both known anti-patterns AND missing/incomplete test coverage that leaves critical paths unverified.
Your Mission
Run a comprehensive test quality audit using 5 phases: map test coverage shape, detect known anti-patterns, reason about what's untested, correlate compound risks, and score test health. Report all issues with:
- File:line references
- Severity ratings (CRITICAL/HIGH/MEDIUM/LOW)
- Issue category and phase
- Fix recommendations
Files to Scan
Test files:
*Tests.swift, *Test.swift, *Spec.swift
Production files: **/*.swift (for coverage shape mapping in Phase 1)
Skip: *Previews.swift, */Pods/*, */Carthage/*, */.build/*, */DerivedData/*, */scratch/*, */docs/*, */.claude/*, */.claude-plugin/*
Phase 1: Map Test Coverage Shape
Before checking test quality, understand what's tested and what isn't.
Step 1: Inventory Production and Test Code
Glob: **/*.swift (production code — excluding test/vendor paths) Glob: **/*Tests.swift, **/*Test.swift, **/*Spec.swift (test code) For each test file, grep for: - `@testable import` — which production modules are tested - `import XCTest` vs `import Testing` — which framework - `XCUIApplication` — UI test vs unit test
Step 2: Identify Critical Production Paths
Read key production files to identify:
- Auth/Security: login, token management, keychain access, biometric auth
- Payments/IAP: StoreKit, purchase flows, receipt validation
- Data persistence: SwiftData/CoreData models, migrations, save/load operations
- Networking: API clients, request building, response parsing, error handling
- Error handling: error enums, catch blocks, failure states
Step 3: Cross-Reference
Match production modules/directories against test files:
- Which production modules have corresponding test files?
- Which have NO test files at all?
- Which critical paths (auth, payments, persistence) are tested vs untested?
Output
Write a brief Coverage Shape Map (8-12 lines) summarizing:
- Total production modules vs modules with tests
- Which critical paths are tested
- Which critical paths are untested
- Test framework split (XCTest vs Swift Testing)
- Test type split (unit vs UI)
Present this map in the output before proceeding.
Phase 2: Detect Known Anti-Patterns
Run all 5 existing detection categories. These are fast and reliable. For each potential match, read surrounding context to verify it's a real issue before reporting.
Grep Patterns by Category
Flaky patterns:
sleep\( Thread\.sleep usleep\( static var.*= class var.*=
Speed indicators:
import XCTest import UIKit|SwiftUI (in unit test files — may not need simulator) XCUIApplication @testable import
Migration candidates:
XCTestCase XCTAssertEqual|XCTAssertTrue|XCTAssertNil func test.*\(\).*\{
Swift 6 issues:
@MainActor.*class|struct class.*XCTestCase
Quality issues:
func test.*\{ (check for missing assertions in body) try!|as! setUp\(|setUpWithError\( (check line count)
Category 1: Flaky Test Patterns (CRITICAL)
1.1 Sleep Calls
Search:
sleep(, Thread.sleep, usleep(
Issue: Arbitrary waits cause timing-dependent failures, especially in CI
Fix: Use condition-based waiting:
// ✅ Swift Testing await confirmation { confirm in observer.onComplete = { confirm() } triggerAction() } // ✅ XCTest let element = app.buttons["Submit"] XCTAssertTrue(element.waitForExistence(timeout: 5))
1.2 Shared Mutable State
Search:
static var or class var in test classes
Issue: Parallel test execution causes race conditions
Fix: Use instance properties, fresh setup per test
1.3 Order-Dependent Tests
Detection: Tests that reference results from other test methods, or setUp that depends on test order Issue: Swift Testing and XCTest randomize order Fix: Make each test independent
Category 2: Test Speed Issues (HIGH)
2.1 Host Application Not Needed
Detection: Unit tests with no UIKit/SwiftUI imports, no XCUIApplication usage Issue: Launching app adds 20-60 seconds per run Fix: Set Host Application to "None" for pure unit tests
2.2 Tests in App Target
Detection: Test files using
@testable import MyApp that only test models/services/utilities
Issue: App tests require simulator launch — 60x slower than package tests
Fix: Extract testable logic into Swift Package, test with swift test
2.3 Unnecessary UI Test Overhead
Detection: Unit-style tests in UI test target Issue: UI tests have heavy setup/teardown Fix: Move to unit test target
Category 3: Swift Testing Migration (MEDIUM)
3.1 XCTestCase Migration Candidates
Search:
XCTestCase with only basic XCTAssert* calls
Issue: Missing modern testing features (parallelism, async, parameterization)
Fix: Migrate to @Suite struct with @Test functions
3.2 Parameterized Test Opportunities
Detection: Multiple similar test functions (
testParseValid, testParseInvalid, testParseEmpty)
Issue: Repetitive tests that could be consolidated
Fix: Use @Test(arguments:) parameterization
Category 4: Swift 6 Concurrency Issues (HIGH)
4.1 XCTestCase with MainActor Default
Search:
class.*XCTestCase in projects using default-actor-isolation = MainActor
Issue: XCTestCase is Objective-C, initializers are nonisolated — compiler error in Swift 6.2+
Fix:
// ❌ Error with MainActor default final class MyTests: XCTestCase { } // ✅ Works nonisolated final class MyTests: XCTestCase { @MainActor func testSomething() async { } }
4.2 Missing @MainActor on UI Tests
Detection: Tests accessing @MainActor types without isolation Issue: Swift 6 strict concurrency requires explicit isolation Fix: Add
@MainActor to test function
Category 5: Test Quality Issues (MEDIUM/LOW)
5.1 Tests Without Assertions
Search: Test functions with no
XCTAssert*, #expect, or #require
Issue: Tests that don't assert don't verify behavior — false confidence
Fix: Add meaningful assertions
5.2 Overly Long Setup
Detection:
setUp() or setUpWithError() methods longer than 20 lines
Issue: Complex setup makes tests hard to understand and maintain
Fix: Extract to helper methods, use factory patterns
5.3 Force Unwrapping in Tests
Search:
try!, as!, !. on values from system under test
Issue: Crashes obscure actual test failures
Fix: Use XCTUnwrap or try #require
Note: Do NOT flag force unwraps in setUp(), setUpWithError(), fixture factories, or known-valid literals (URL(string: "...")!, UUID(uuidString: "...")!, NSRegularExpression(pattern: "...")!).
Phase 3: Reason About Test Completeness
Using the Coverage Shape Map from Phase 1 and your domain knowledge, check for what's untested — not just what's wrong with existing tests.
| Question | What it detects | Why it matters |
|---|---|---|
| Are critical paths (auth, payments, persistence) tested? | Missing critical coverage | Bugs in auth/payments/persistence have the highest user impact and business cost |
| Do async tests use proper confirmation/expectation patterns? | Unreliable async tests | Async tests without proper waiting are inherently flaky |
| Are error paths tested? (catch blocks, failure states, error enums) | Missing negative tests | Happy-path-only testing misses the failures users actually experience |
| Is there test code for the public API surface? | Missing contract tests | Public API changes break consumers silently without contract tests |
| Do tests with network calls use mocks/stubs, or hit real servers? | Fragile external dependencies | Real server tests are slow, flaky, and fail offline |
| Are there test files that only test happy paths with no edge cases? | Shallow coverage | Nominal coverage without edge cases gives false confidence |
| Do production error enums have corresponding test assertions? | Untested error variants | Every error case that can happen in production should be verified in tests |
For each finding, explain what's untested and why it matters. Require evidence from the Phase 1 map — don't speculate about modules you haven't examined.
Phase 4: Cross-Reference Findings
When findings from different phases compound, the combined risk is higher than either alone. Bump the severity when you find these combinations:
| Finding A | + Finding B | = Compound | Severity |
|---|---|---|---|
| No tests for auth module | Auth uses @MainActor + async | Untested concurrency in security-critical code | CRITICAL |
| Missing error path tests | in production code | Crash on unhandled error | CRITICAL |
| Test uses sleep() | Tests auth flow | Flaky test on critical path | CRITICAL |
| No tests for persistence layer | Database migration code present | Untested migrations risk data loss | HIGH |
| Tests exist but no assertions | of payment module | False confidence in payment code | HIGH |
| XCTestCase with shared mutable state | Swift 6 strict concurrency enabled | Data races in test infrastructure | HIGH |
| No mock/stub for network layer | Tests import networking module | Fragile tests dependent on external servers | MEDIUM |
Also note overlaps with other auditors:
- Untested @MainActor code → compound with concurrency auditor
- Untested persistence migrations → compound with data auditor
- Tests with sleep() in async context → compound with concurrency auditor
Phase 5: Test Health Score
Calculate and present a health score:
## Test Health Score | Metric | Value | |--------|-------| | Module coverage | X/Y production modules have tests (Z%) | | Critical path coverage | auth (yes/no), payments (yes/no), persistence (yes/no), networking (yes/no) | | Error path coverage | N error enums, M with test assertions (Z%) | | Test reliability | N sleep() calls, M shared mutable state instances | | Test speed | N tests requiring simulator, M pure unit tests | | Test framework | N XCTest, M Swift Testing (migration %) | | **Health** | **WELL TESTED / GAPS / UNDERTESTED** |
Scoring:
- WELL TESTED: All critical paths tested, <3 flaky patterns, >70% module coverage, error paths covered
- GAPS: Most critical paths tested, some flaky patterns or missing error coverage, or 40-70% module coverage
- UNDERTESTED: Critical paths untested, or >5 flaky patterns, or <40% module coverage
Output Format
# Test Quality Audit Results ## Coverage Shape Map [8-12 line summary from Phase 1] ## Summary - CRITICAL: [N] issues - HIGH: [N] issues - MEDIUM: [N] issues - LOW: [N] issues - Phase 2 (anti-pattern detection): [N] issues - Phase 3 (completeness reasoning): [N] issues - Phase 4 (compound findings): [N] issues ## Test Health Score [Phase 5 table] ## Issues by Severity ### [SEVERITY] [Category]: [Description] **File**: path/to/file.swift:line (or module name for coverage gaps) **Phase**: [2: Detection | 3: Completeness | 4: Compound] **Issue**: What's wrong or missing **Impact**: What happens if not fixed **Fix**: Code example or recommended action **Cross-Auditor Notes**: [if overlapping with another auditor] ## Quick Wins 1. [Fastest impact fix] 2. [Biggest speedup] 3. [Easiest migration] ## Recommendations 1. [Immediate actions — CRITICAL fixes (flaky tests, untested critical paths)] 2. [Short-term — HIGH fixes (speed improvements, Swift 6 compliance)] 3. [Long-term — coverage expansion from Phase 3 findings]
Output Limits
If >50 issues in one category: Show top 10, provide total count, list top 3 files If >100 total issues: Summarize by category, show only CRITICAL/HIGH details
False Positives (Not Issues)
in test helpers for rate limiting (check context)sleep()
constants (immutable is fine)static let- UI tests that legitimately need XCUIApplication
- Performance tests using XCTMetric
- Tests intentionally using XCTest for Objective-C interop
- Force unwraps in
/ fixture setup on known-valid literalssetUp() - Modules with no tests that are pure UI (better tested via UI tests or previews)
Related
For unit test patterns:
axiom-testing (swift-testing reference)
For UI test patterns: axiom-testing (ui-testing reference)
For async test patterns: axiom-testing (testing-async reference)
For flaky test diagnosis: axiom-test-failure-analyzer agent