Awesome-omni-skill testing-strategy
Comprehensive guide for implementing AIDB tests following E2E-first philosophy,
git clone https://github.com/diegosouzapw/awesome-omni-skill
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/testing-security/testing-strategy" ~/.claude/skills/diegosouzapw-awesome-omni-skill-testing-strategy-3a4491 && rm -rf "$T"
skills/testing-security/testing-strategy/SKILL.mdAIDB Testing Strategy
Priority: E2E → Integration → Unit (Highest ROI First)
CRITICAL: Test Execution Command
All tests MUST be run via
:./dev-cli test run
./dev-cli test run -s {suite} [-k 'pattern'] [-l {lang}]
- Multiple
and-k
flags supported-l - NEVER use
- suites know their natural execution environment; forcing local causes unexpected behavior--local - Direct
invocation is NOT supportedpytest
This skill guides you through creating and modifying tests for the AIDB project. The test infrastructure is complete - your job is to implement tests using proven patterns.
Related Skills
When implementing tests, you may also need:
- adapter-development - Tests validate adapter behavior across languages
- mcp-tools-development - MCP tools tested with DebugInterface abstraction
- dap-protocol-guide - Tests exercise DAP protocol flows end-to-end
- ci-cd-workflows - For testing CI/CD workflows themselves (not application code)
Core Philosophy
For complete test architecture, see test infrastructure in
src/tests/.
1. E2E First, Unit Last
Why E2E First?
- Validates real user workflows
- Exercises the full stack (catches integration bugs early)
- Most bugs discovered when testing actual components
- Higher ROI than unit tests initially
Testing Priority:
- E2E Tests - Complete workflows, real programs, full integration
- Integration Tests - Component interactions, adapter lifecycle, state management
- Unit Tests - Edge cases, specific logic, error handling
2. Framework Tests: Keep It Simple
Goal: Verify we can launch and debug programs using various frameworks
Pattern:
- Connect to framework app (Django, Express, Spring Boot, etc.)
- Quick smoke test: set breakpoint → inspect → step
- That's it. Don't test framework internals.
We're testing that AIDB works WITH frameworks, not testing the frameworks themselves.
3. MCP Responses: Structure + Content + Efficiency
Don't just validate structure - validate content accuracy and payload efficiency.
Bad test:
assert "locals" in response["data"] # Structure only
Good test:
# Structure assert "locals" in response["data"] # Content accuracy assert response["data"]["locals"]["x"]["value"] == 10 assert response["data"]["locals"]["x"]["line"] == 5 # Efficiency (no junk) assert len(response["data"]) <= 3 # No bloated payloads assert len(response["summary"]) <= 200 # Concise summaries
4. VS Launch: Critical for Agent Workflows
Why critical:
- Primary entry point for agents using framework debugging
- Enables use of existing workspace launch configs
- Complex variable substitution must work (
, etc.)${workspaceFolder}
Test thoroughly:
- Core launch.json parsing (language-independent)
- Per-language config translation (Python/JavaScript/Java)
- Variable substitution and resolution
- Framework-specific launch configs
Critical Breakpoint Timing:
Set breakpoints when STARTING sessions (not after) to avoid race conditions with fast-executing programs:
# ✅ CORRECT await debug_interface.start_session(program=prog, breakpoints=[{"line": 10}]) # ❌ WRONG: Race condition await debug_interface.start_session(program=prog) await debug_interface.set_breakpoint(file=prog, line=10) # May be too late!
Exception: Long-running processes (servers) where you attach. Reference:
src/tests/aidb_shared/e2e/test_complex_workflows.py
DebugInterface Abstraction
The cornerstone of our test strategy is the DebugInterface abstraction - a unified API that works with both MCP tools and the direct API.
For implementation details, see:
- DebugInterface - Skill resource file
- Debug interface source and docstringssrc/tests/_helpers/debug_interface/
Why? One test validates both entry points.
Hypothetical Example (illustrates pattern, not a real test file):
from tests._helpers.parametrization import parametrize_interfaces class TestBreakpoints(BaseE2ETest): @parametrize_interfaces # Runs twice: MCP and API @pytest.mark.asyncio async def test_set_breakpoint(self, debug_interface, simple_program): """Test works with BOTH MCP and API.""" await debug_interface.start_session(program=simple_program) bp = await debug_interface.set_breakpoint( file=simple_program, line=5 ) self.verify_bp.verify_breakpoint_verified(bp) await debug_interface.stop_session()
Key Points:
runs test with both MCP and API@parametrize_interfaces- Same test logic validates both entry points
- No duplication, no drift
- See DebugInterface for details
The Shared Suite: Testing Debug Fundamentals
The shared suite is AIDB's language-agnostic test foundation that validates core debugging capabilities across all supported languages using normalized, programmatically generated test programs.
Key Innovation: Semantic markers that map identical logic to language-specific line numbers.
Location:
src/tests/aidb_shared/ (integration/ + e2e/)
What it tests:
- Debug primitives (breakpoints, stepping, variables)
- Control flow across all 3 languages (Python, JavaScript, Java)
- Zero duplication: One test → 6 execution paths (2 interfaces × 3 languages)
For complete details, see DebugInterface.
When to Use the Shared Suite
Use shared suite when:
- Testing core debug operations (breakpoint, step, inspect)
- Validating adapter behavior across languages
- Ensuring language parity (all adapters work identically)
Use framework tests when:
- Testing framework-specific debugging (Django ORM, Express middleware)
- Validating launch.json configurations
- Testing real-world application patterns
Test Organization
src/tests/ ├── aidb_shared/ # ⭐ Shared suite: language-agnostic debug fundamentals │ ├── integration/ # Core debug operations (breakpoints, stepping, variables) │ └── e2e/ # Complex workflows, parallel sessions ├── aidb/ # Core API tests - organized by component │ ├── adapters/ # Adapter-specific tests │ ├── audit/ # Audit logging tests │ ├── common/ # Common utilities tests │ ├── dap/ # DAP client tests │ ├── models/ # Model tests │ ├── resources/ # Resource management tests │ ├── service/ # Service layer tests │ └── session/ # Session management tests ├── aidb_mcp/ # MCP server tests - organized by component ├── frameworks/ # Framework integration tests │ ├── python/ # Flask, FastAPI, pytest │ ├── javascript/ # Express, Jest │ └── java/ # Spring Boot, JUnit ├── _helpers/ # Test helpers and utilities ├── _fixtures/ # Shared fixtures │ └── unit/ # ⭐ Unit test infrastructure (see below) └── _assets/ # Test programs and data ├── framework_apps/ # Framework test applications └── test_programs/ # Generated programs for shared suite
Unit Test Infrastructure
The centralized unit test infrastructure at
src/tests/_fixtures/unit/ provides reusable mocks, builders, and fixtures:
_fixtures/unit/ ├── builders/ # DAPRequestBuilder, DAPResponseBuilder, DAPEventBuilder ├── dap/ # Transport, events, receiver mocks ├── session/ # Registry, lifecycle, state, child_manager mocks ├── adapter/ # Port, process, launch_orchestrator mocks ├── mcp/ # DebugService, MCPSessionContext mocks ├── conftest.py # Master fixture re-exports ├── context.py # mock_ctx, null_ctx, tmp_storage └── assertions.py # UnitAssertions class
Usage Pattern:
# In domain conftest.py (e.g., src/tests/aidb/dap/unit/conftest.py) from tests._fixtures.unit.conftest import * # noqa: F401, F403 from tests._fixtures.unit.builders import DAPEventBuilder, DAPResponseBuilder # In test file def test_something(mock_ctx, mock_transport): event = DAPEventBuilder.stopped_event(reason="breakpoint") # ...
Key Components:
- Builders - Fluent API for DAP protocol objects (requests, responses, events)
- mock_ctx - Standard logging context mock with debug/info/warning/error methods
- UnitAssertions - DAP-specific assertion helpers
Test Execution Modes
Test suites run in different environments based on their requirements:
Local-Only Suites (no Docker):
- CLI command testscli
- MCP server unit/integration testsmcp
- Core AIDB API testscore
- Common utilities testscommon
- Logging framework testslogging
- CI/CD workflow testsci_cd
Docker Suites (require containers):
- Multi-language shared tests (parallel language containers)shared
- Framework integration tests (parallel language containers)frameworks
- Launch config tests (parallel language containers)launch
Why the split?
- Local suites test Python-only logic (handlers, validation, utils)
- Docker suites test multi-language scenarios
- Multi-language MCP functionality tested in
/shared
/frameworkslaunch
Running tests:
./dev-cli test run -s mcp # Local execution ./dev-cli test run -s shared # Docker execution
Code Reuse: Don't Reinvent
Always use existing infrastructure:
- Test Base Classes -
,BaseE2ETest
,BaseIntegrationTestFrameworkDebugTestBase - Parametrization Decorators -
,@parametrize_interfaces@parametrize_languages - Helper Assertions -
,self.verify_bp
,self.verify_execMCPAssertions - Constants -
,StopReason
,TestTimeoutsMCPTool
For complete details, see E2E Patterns.
Working Examples
Study these real tests before writing new ones:
Framework Tests (E2E)
- Python:
,test_flask_debugging.py
,test_fastapi_debugging.pytest_pytest_debugging.py - JavaScript:
,test_express_debugging.pytest_jest_debugging.py - Java:
,test_springboot_debugging.pytest_junit_debugging.py
Core API Tests
- Launch Variable Resolution:
test_launch_variable_resolution.py - Session Target Handling:
test_session_target_handling.py
For complete file paths and patterns, see E2E Patterns.
Common Patterns
For hypothetical examples illustrating common patterns, see E2E Patterns.
Key patterns covered:
- Basic E2E Test
- Breakpoint Test with Markers
- Dual-Launch Equivalence Test
- MCP Response Validation
When Creating New Tests
Step 1: Choose Test Type
- E2E? Full workflow, real program, complete integration
- Integration? Component interactions, lifecycle management
- Unit? Specific function, edge case, error handling
Step 2: Find Similar Test
Look at E2E Patterns:
- Django/Express for framework tests
- Existing tests in the same directory
- Similar test scenarios in other languages
Step 3: Copy Pattern, Adapt
Don't start from scratch:
- Copy a working test
- Adapt to your scenario
- Use same helpers and assertions
- Follow same structure
Step 4: Use Existing Infrastructure
Don't create:
- New assertion helpers (use existing)
- New fixtures (check
files first)conftest.py - New constants (use
)constants.py - New base classes (inherit from existing)
Do create:
- Tests using existing patterns
- Scenario-specific test data
- Framework-specific fixtures (if needed)
Performance Testing
Current State: No performance baselines exist yet
Phase 1: Establish baselines
- Analyze existing metrics/instrumentation
- Determine "healthy" latencies
- Document target times
Phase 2: Regression testing
- Monitor key operations (breakpoint set, variable inspect, step)
- Alert on degradation
For now: Focus on functional correctness, not performance.
Success Criteria
Test Quality Checklist
- Test uses
for MCP/API coverage@parametrize_interfaces - Test inherits from appropriate base class
- Test uses helper assertions, not custom assertions
- Test validates content accuracy, not just structure
- MCP tests check efficiency (no bloated payloads)
- Test has clear docstring explaining what it validates
- Test follows working examples
- Test is in correct directory (e2e/integration/unit)
Framework Test Checklist
- Inherits from
FrameworkDebugTestBase - Implements
test_launch_via_api() - Implements
test_launch_via_vscode_config() - Implements
test_dual_launch_equivalence() - Sets
attributeframework_name - Uses simple smoke tests (no deep framework testing)
Investigating Test Failures
CRITICAL: When tests fail, check logs BEFORE attempting fixes.
See: Debugging Failures for log locations, investigation workflow, and common patterns.
For CI test failures, use the ci-cd-workflows skill's troubleshooting guide.
Resources
| Resource | Content |
|---|---|
| E2E Patterns | Test patterns, markers, code reuse, working examples |
| Framework Tests | Dual-launch pattern, Flask/Express examples |
| DebugInterface | Unified API abstraction, shared suite architecture |
| Debugging Failures | Log locations, investigation workflow, common issues |
Test Infrastructure:
src/tests/ (see _fixtures/, _helpers/ for core components)
Getting Started
- Read CONTEXT.md:
wip/test-implementation-backlog/CONTEXT.md - Study working examples: Flask (
) and Express (test_flask_debugging.py
)test_express_debugging.py - Choose a test to implement: Start with E2E (highest ROI)
- Copy a working test: Don't start from scratch
- Adapt to your scenario: Use same patterns, different data
- Validate: Run test, ensure it passes with both MCP and API
Questions?
Internal Documentation:
- Test infrastructure (see _fixtures/, _helpers/)src/tests/
- System architecturedocs/developer-guide/overview.md
Code References:
- DAP Protocol: See
(fully typed, types.py + requests.py + responses.py + events.py)src/aidb/dap/protocol/ - Test Infrastructure: See
andsrc/tests/_helpers/src/tests/_fixtures/ - Working Examples: See Flask/Express framework tests
Remember:
- E2E first, validate content accuracy
- Use shared suite for debug fundamentals, framework tests for integration
- Keep framework tests simple (no framework internals)
- Always use the DebugInterface abstraction (zero duplication)