Claude-skill-registry codex-readiness-integration-test
Run the Codex Readiness integration test. Use when you need an end-to-end agentic loop with build/test scoring.
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/codex-readiness-integration-test" ~/.claude/skills/majiayu000-claude-skill-registry-codex-readiness-integration-test && rm -rf "$T"
manifest:
skills/data/codex-readiness-integration-test/SKILL.mdsource content
LLM Codex Readiness Integration Test
This skill runs a multi-stage integration test to validate agentic execution quality. It always runs in execute mode (no read-only mode).
Outputs
Each run writes to
.codex-readiness-integration-test/<timestamp>/ and updates .codex-readiness-integration-test/latest.json.
New outputs per run:
andagentic_summary.json
(agentic loop execution)logs/agentic.log
(automatic LLM evaluation)llm_results.json
(human-readable summary)summary.txt
Pre-conditions (Required)
- Authenticate with the Codex CLI using the repo-local HOME before running the test. Run these in your own terminal (not via the integration test): HOME=$PWD/.codex-home XDG_CACHE_HOME=$PWD/.codex-home/.cache codex login HOME=$PWD/.codex-home XDG_CACHE_HOME=$PWD/.codex-home/.cache codex login status
- The integration test creates {repo_root}/.codex-home and {repo_root}/.codex-home/.cache/codex as its first step.
Workflow
- Ask the user how to source the task.
- Offer two explicit options: (a) user provides a custom task/prompt, or (b) auto-generate a task.
- Do not run the entry point until the user chooses one option.
- Generate or load
.{out_dir}/prompt.pending.json- Use the integration test's expected prompt path, not
at the repo root.prompt.json - With the default out dir, this path is
..codex-readiness-integration-test/prompt.pending.json - If
is provided, it is used as the starting task.--seed-task - If not provided, generate a task with
and save the JSON toskills/codex-readiness-integration-test/references/generate_prompt.md
.{out_dir}/prompt.pending.json - The user must approve the prompt before execution (no auto-approve mode). Make sure to output a summary of the prompt when asking the user to approve.
- Use the integration test's expected prompt path, not
- Execute the agentic loop via Codex CLI (uses
andAGENTS.md
).change_prompt - Run build/test commands from the prompt plan via
.skills/codex-readiness-integration-test/scripts/run_plan.py - Collect evidence (
), deterministic checks, and run automatic LLM evals via Codex CLI.evidence.json - Score and write the report + summary output.
Configuration
Optional fields in
{out_dir}/prompt.pending.json:
: configure Codex CLI invocation for the agentic loop.agentic_loop
: configure Codex CLI invocation for automatic evals.llm_eval
If these fields are omitted, defaults are used.
Requirements
- The LLM evaluator must fail if evidence mentions the phrase
.Context compaction enabled - Use qualitative context-usage evaluation (no strict thresholds).
What this test covers well
- Runs Codex CLI against the real repo root, producing real filesystem edits and git diffs.
- Executes the approved change prompt and then runs the build/test plan in-repo.
- Captures evidence, deterministic checks, and LLM eval artifacts for review.
What this test does not represent
- The agentic loop may use non-default flags (e.g., bypass approvals/sandbox), so interactive guardrails differ.
- Uses a dedicated HOME (
), which can change auth/config/cache vs normal CLI use..codex-home - Auto-generated prompts and one-shot execution do not simulate interactive guidance.
- MCP servers/tools are not exercised unless explicitly configured.
Notes
- The prompts in
expect strict JSON.skills/codex-readiness-integration-test/references/ - Use
to repair invalid JSON output.skills/codex-readiness-integration-test/references/json_fix.md - This skill calls the
CLI. Ensure it is installed and available on PATH, or override the command incodex
.{out_dir}/prompt.pending.json - If the agentic loop detects sandbox-blocked tool access, it now writes
torequires_escalation: true
and exits with code{run_dir}/agentic_summary.json
. Re-run the integration test with escalated permissions in that case.3