Agentops security-suite

Composable security suite for binary and prompt-surface assurance, static analysis, dynamic tracing, repo-native redteam scans, contract capture, baseline drift, and policy gating. Triggers: "binary security", "reverse engineer binary", "black-box binary test", "behavioral trace", "baseline diff", "prompt redteam", "security suite".

install

source · Clone the upstream repo

git clone https://github.com/boshu2/agentops

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/boshu2/agentops "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills-codex/security-suite" ~/.claude/skills/boshu2-agentops-security-suite && rm -rf "$T"

manifest: skills-codex/security-suite/SKILL.md

Security Suite

Purpose: Provide composable, repeatable security/internal-testing primitives for authorized binaries and repo-managed prompt surfaces.

This skill separates concerns into primitives so security workflows stay testable and reusable.

Guardrails

Use only on binaries you own or are explicitly authorized to assess.
Do not use this workflow to bypass legal restrictions or extract third-party proprietary content without authorization.
Prefer behavioral assurance and policy gating over ad-hoc one-off reverse-engineering.

Primitive Model

```
collect-static
```
— file metadata, runtime heuristics, linked libraries, embedded archive signatures.
```
collect-dynamic
```
— sandboxed execution trace (processes, file changes, network endpoints).
```
collect-contract
```
— machine-readable behavior contract from help-surface probing.
```
compare-baseline
```
— current vs baseline contract drift (added/removed commands, runtime change).
```
enforce-policy
```
— allowlist/denylist gates and severity-based verdict.
```
collect-redteam
```
— offline repo-surface attack-pack scan for prompt-injection, tool-misuse, secret-exfiltration, and unsafe-shell regressions.
```
run
```
— thin binary orchestrator that composes primitives and writes suite summary.

Quick Start

Single run (default dynamic command is

--help

python3 skills/security-suite/scripts/security_suite.py run \
  --binary "$(command -v ao)" \
  --out-dir .tmp/security-suite/ao-current

Baseline regression gate:

python3 skills/security-suite/scripts/security_suite.py run \
  --binary "$(command -v ao)" \
  --out-dir .tmp/security-suite/ao-current \
  --baseline-dir .tmp/security-suite/ao-baseline \
  --fail-on-removed

Policy gate:

python3 skills/security-suite/scripts/security_suite.py run \
  --binary "$(command -v ao)" \
  --out-dir .tmp/security-suite/ao-current \
  --policy-file skills/security-suite/references/policy-example.json \
  --fail-on-policy-fail

Repo-surface redteam:

python3 skills/security-suite/scripts/prompt_redteam.py scan \
  --repo-root . \
  --pack-file skills/security-suite/references/agentops-redteam-pack.json \
  --out-dir .tmp/security-suite-redteam

Recommended Workflow

Capture baseline on known-good release.
Run suite on candidate binary in CI.
Compare against baseline and enforce policy.
Block promotion on failing verdict.

Output Contract

All outputs are written under

--out-dir

```
static/static-analysis.json
```
```
dynamic/dynamic-analysis.json
```
```
contract/contract.json
```
```
compare/baseline-diff.json
```
(when baseline supplied)
```
policy/policy-verdict.json
```
(when policy supplied)
```
suite-summary.json
```
```
redteam/redteam-results.json
```
(when repo-surface redteam is run)

This output structure is intentionally machine-consumable for CI gates.

Policy Model

Use

skills/security-suite/references/policy-example.json

as a starting point.

Supported checks:

```
required_top_level_commands
```
```
deny_command_patterns
```
```
max_created_files
```
```
forbid_file_path_patterns
```
```
allow_network_endpoint_patterns
```
```
deny_network_endpoint_patterns
```
```
block_if_removed_commands
```
```
min_command_count
```

Redteam Pack Model

Use agentops-redteam-pack.json as the starting point for offline repo-surface redteam checks.

Supported target fields:

```
globs
```
```
require_groups
```
```
forbidden_any
```
```
applies_if_any
```

Each case expresses a concrete adversarial prompt or operator-bypass attempt and binds it to one or more repo-owned files. The first shipped pack covers instruction precedence, context overexposure, destructive git misuse, security gate bypass, and unsafe shell or secret-handling regressions.

Technique Coverage

This suite is designed for broad binary classes, not just CLI metadata:

static runtime/library fingerprinting
sandboxed behavior observation
command/contract capture
drift classification
policy enforcement and CI verdicting
repo-surface redteam checks for prompt and operator-contract regressions

It is intentionally modular so you can add deeper primitives later (syscall tracing, SBOM attestation verification, fuzz harnesses) without rewriting the workflow.

Validation

Run:

bash skills/security-suite/scripts/validate.sh
bash tests/scripts/test-security-suite-redteam.sh

Smoke test (recommended):

python3 skills/security-suite/scripts/security_suite.py run \
  --binary "$(command -v ao)" \
  --out-dir .tmp/security-suite-smoke \
  --policy-file skills/security-suite/references/policy-example.json

Repo-surface smoke test:

python3 skills/security-suite/scripts/prompt_redteam.py scan \
  --repo-root . \
  --pack-file skills/security-suite/references/agentops-redteam-pack.json \
  --out-dir .tmp/security-suite-redteam-smoke

Examples

Scenario: Capture a Baseline and Gate a New Release

User says:

$security-suite run --binary $(command -v ao) --out-dir .tmp/security-suite/ao-v2.4

What happens:

The suite runs static analysis (file metadata, linked libraries, embedded archive signatures), dynamic tracing (sandboxed
```
--help
```
execution observing processes, file changes, network endpoints), and contract capture against the
```
ao
```
binary.

It writes

static/static-analysis.json

dynamic/dynamic-analysis.json

contract/contract.json

, and

suite-summary.json

under the output directory.

Result: A complete baseline snapshot is captured for

ao

v2.4, ready to be used as

--baseline-dir

for future release comparisons.

Scenario: CI Regression Gate With Baseline and Policy

User says:

$security-suite run --binary ./bin/ao-candidate --out-dir .tmp/ao-candidate --baseline-dir .tmp/security-suite/ao-v2.4 --policy-file skills/security-suite/references/policy-example.json --fail-on-removed --fail-on-policy-fail

What happens:

The suite runs all three collection primitives on the candidate binary, then compares the resulting contract against the v2.4 baseline to produce
```
compare/baseline-diff.json
```
with any added, removed, or changed commands.
It evaluates the policy file checks (required commands, denied patterns, network allowlists, file limits) and writes
```
policy/policy-verdict.json
```
with a pass/fail verdict.

Result: The suite exits non-zero if any commands were removed or a policy check failed, blocking the candidate from promotion in the CI pipeline.

Scenario: Offline Redteam the Repo's Prompt and Skill Surfaces

User says:

$security-suite collect-redteam --repo-root .

What happens:

The redteam scanner loads the attack pack from
```
agentops-redteam-pack.json
```
and evaluates repo-owned control surfaces against concrete attack cases.
It writes
```
redteam/redteam-results.json
```
and
```
redteam/redteam-results.md
```
under the chosen output directory, then exits non-zero if a fail-severity case is not resisted.

Result: The repo gets a deterministic redteam verdict for prompt-injection, tool misuse, context overexposure, secret-handling, and unsafe-shell regressions without needing hosted model scanning.

Troubleshooting

Problem	Cause	Solution
Suite exits non-zero with no clear finding	`--fail-on-removed` or `--fail-on-policy-fail` triggered on a legitimate change	Review `compare/baseline-diff.json` and `policy/policy-verdict.json` to identify the specific delta, then update the baseline or policy file accordingly.
`dynamic/dynamic-analysis.json` is empty or minimal	Binary requires arguments beyond `--help` , or sandbox blocked execution	Supply a custom dynamic command if supported, or verify the binary runs in the sandboxed environment (check permissions, missing shared libraries).
`contract/contract.json` shows zero commands	The binary does not expose a `--help` surface or uses a non-standard help flag	Verify the binary supports `--help` ; for binaries with unusual help interfaces, run `collect-contract` separately with the correct invocation.
Policy verdict fails on `deny_command_patterns`	A new subcommand matches a deny regex in the policy file	Either rename the subcommand or update `deny_command_patterns` in your policy JSON to exclude the legitimate pattern.
`baseline-diff.json` not generated	`--baseline-dir` was not provided or points to a missing directory	Ensure the baseline directory exists and contains a valid `contract/contract.json` from a prior run.
Redteam scan fails after a wording cleanup	The attack pack no longer matches the intended guardrail language in target files	Review `redteam/redteam-results.json` , confirm whether the control regressed or the regex is too brittle, then update the target file or the pack intentionally.

Local Resources

references/

scripts/

```
scripts/prompt_redteam.py
```
```
scripts/security_suite.py
```
```
scripts/validate.sh
```