Agentops security-suite
Composable security suite for binary and prompt-surface assurance, static analysis, dynamic tracing, repo-native redteam scans, contract capture, baseline drift, and policy gating. Triggers: "binary security", "reverse engineer binary", "black-box binary test", "behavioral trace", "baseline diff", "prompt redteam", "security suite".
git clone https://github.com/boshu2/agentops
T=$(mktemp -d) && git clone --depth=1 https://github.com/boshu2/agentops "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills-codex/security-suite" ~/.claude/skills/boshu2-agentops-security-suite && rm -rf "$T"
skills-codex/security-suite/SKILL.mdSecurity Suite
Purpose: Provide composable, repeatable security/internal-testing primitives for authorized binaries and repo-managed prompt surfaces.
This skill separates concerns into primitives so security workflows stay testable and reusable.
Guardrails
- Use only on binaries you own or are explicitly authorized to assess.
- Do not use this workflow to bypass legal restrictions or extract third-party proprietary content without authorization.
- Prefer behavioral assurance and policy gating over ad-hoc one-off reverse-engineering.
Primitive Model
— file metadata, runtime heuristics, linked libraries, embedded archive signatures.collect-static
— sandboxed execution trace (processes, file changes, network endpoints).collect-dynamic
— machine-readable behavior contract from help-surface probing.collect-contract
— current vs baseline contract drift (added/removed commands, runtime change).compare-baseline
— allowlist/denylist gates and severity-based verdict.enforce-policy
— offline repo-surface attack-pack scan for prompt-injection, tool-misuse, secret-exfiltration, and unsafe-shell regressions.collect-redteam
— thin binary orchestrator that composes primitives and writes suite summary.run
Quick Start
Single run (default dynamic command is
--help):
python3 skills/security-suite/scripts/security_suite.py run \ --binary "$(command -v ao)" \ --out-dir .tmp/security-suite/ao-current
Baseline regression gate:
python3 skills/security-suite/scripts/security_suite.py run \ --binary "$(command -v ao)" \ --out-dir .tmp/security-suite/ao-current \ --baseline-dir .tmp/security-suite/ao-baseline \ --fail-on-removed
Policy gate:
python3 skills/security-suite/scripts/security_suite.py run \ --binary "$(command -v ao)" \ --out-dir .tmp/security-suite/ao-current \ --policy-file skills/security-suite/references/policy-example.json \ --fail-on-policy-fail
Repo-surface redteam:
python3 skills/security-suite/scripts/prompt_redteam.py scan \ --repo-root . \ --pack-file skills/security-suite/references/agentops-redteam-pack.json \ --out-dir .tmp/security-suite-redteam
Recommended Workflow
- Capture baseline on known-good release.
- Run suite on candidate binary in CI.
- Compare against baseline and enforce policy.
- Block promotion on failing verdict.
Output Contract
All outputs are written under
--out-dir:
static/static-analysis.jsondynamic/dynamic-analysis.jsoncontract/contract.json
(when baseline supplied)compare/baseline-diff.json
(when policy supplied)policy/policy-verdict.jsonsuite-summary.json
(when repo-surface redteam is run)redteam/redteam-results.json
This output structure is intentionally machine-consumable for CI gates.
Policy Model
Use
skills/security-suite/references/policy-example.json as a starting point.
Supported checks:
required_top_level_commandsdeny_command_patternsmax_created_filesforbid_file_path_patternsallow_network_endpoint_patternsdeny_network_endpoint_patternsblock_if_removed_commandsmin_command_count
Redteam Pack Model
Use agentops-redteam-pack.json as the starting point for offline repo-surface redteam checks.
Supported target fields:
globsrequire_groupsforbidden_anyapplies_if_any
Each case expresses a concrete adversarial prompt or operator-bypass attempt and binds it to one or more repo-owned files. The first shipped pack covers instruction precedence, context overexposure, destructive git misuse, security gate bypass, and unsafe shell or secret-handling regressions.
Technique Coverage
This suite is designed for broad binary classes, not just CLI metadata:
- static runtime/library fingerprinting
- sandboxed behavior observation
- command/contract capture
- drift classification
- policy enforcement and CI verdicting
- repo-surface redteam checks for prompt and operator-contract regressions
It is intentionally modular so you can add deeper primitives later (syscall tracing, SBOM attestation verification, fuzz harnesses) without rewriting the workflow.
Validation
Run:
bash skills/security-suite/scripts/validate.sh bash tests/scripts/test-security-suite-redteam.sh
Smoke test (recommended):
python3 skills/security-suite/scripts/security_suite.py run \ --binary "$(command -v ao)" \ --out-dir .tmp/security-suite-smoke \ --policy-file skills/security-suite/references/policy-example.json
Repo-surface smoke test:
python3 skills/security-suite/scripts/prompt_redteam.py scan \ --repo-root . \ --pack-file skills/security-suite/references/agentops-redteam-pack.json \ --out-dir .tmp/security-suite-redteam-smoke
Examples
Scenario: Capture a Baseline and Gate a New Release
User says:
$security-suite run --binary $(command -v ao) --out-dir .tmp/security-suite/ao-v2.4
What happens:
- The suite runs static analysis (file metadata, linked libraries, embedded archive signatures), dynamic tracing (sandboxed
execution observing processes, file changes, network endpoints), and contract capture against the--help
binary.ao - It writes
,static/static-analysis.json
,dynamic/dynamic-analysis.json
, andcontract/contract.json
under the output directory.suite-summary.json
Result: A complete baseline snapshot is captured for
ao v2.4, ready to be used as --baseline-dir for future release comparisons.
Scenario: CI Regression Gate With Baseline and Policy
User says:
$security-suite run --binary ./bin/ao-candidate --out-dir .tmp/ao-candidate --baseline-dir .tmp/security-suite/ao-v2.4 --policy-file skills/security-suite/references/policy-example.json --fail-on-removed --fail-on-policy-fail
What happens:
- The suite runs all three collection primitives on the candidate binary, then compares the resulting contract against the v2.4 baseline to produce
with any added, removed, or changed commands.compare/baseline-diff.json - It evaluates the policy file checks (required commands, denied patterns, network allowlists, file limits) and writes
with a pass/fail verdict.policy/policy-verdict.json
Result: The suite exits non-zero if any commands were removed or a policy check failed, blocking the candidate from promotion in the CI pipeline.
Scenario: Offline Redteam the Repo's Prompt and Skill Surfaces
User says:
$security-suite collect-redteam --repo-root .
What happens:
- The redteam scanner loads the attack pack from
and evaluates repo-owned control surfaces against concrete attack cases.agentops-redteam-pack.json - It writes
andredteam/redteam-results.json
under the chosen output directory, then exits non-zero if a fail-severity case is not resisted.redteam/redteam-results.md
Result: The repo gets a deterministic redteam verdict for prompt-injection, tool misuse, context overexposure, secret-handling, and unsafe-shell regressions without needing hosted model scanning.
Troubleshooting
| Problem | Cause | Solution |
|---|---|---|
| Suite exits non-zero with no clear finding | or triggered on a legitimate change | Review and to identify the specific delta, then update the baseline or policy file accordingly. |
is empty or minimal | Binary requires arguments beyond , or sandbox blocked execution | Supply a custom dynamic command if supported, or verify the binary runs in the sandboxed environment (check permissions, missing shared libraries). |
shows zero commands | The binary does not expose a surface or uses a non-standard help flag | Verify the binary supports ; for binaries with unusual help interfaces, run separately with the correct invocation. |
Policy verdict fails on | A new subcommand matches a deny regex in the policy file | Either rename the subcommand or update in your policy JSON to exclude the legitimate pattern. |
not generated | was not provided or points to a missing directory | Ensure the baseline directory exists and contains a valid from a prior run. |
| Redteam scan fails after a wording cleanup | The attack pack no longer matches the intended guardrail language in target files | Review , confirm whether the control regressed or the regex is too brittle, then update the target file or the pack intentionally. |
Local Resources
references/
scripts/
scripts/prompt_redteam.pyscripts/security_suite.pyscripts/validate.sh