Claude-skill-registry codex-safe-experiment

Use Codex CLI sandbox mode to try risky changes safely. Isolated experimentation with network disabled and directory restrictions.

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/codex-safe-experiment" ~/.claude/skills/majiayu000-claude-skill-registry-codex-safe-experiment && rm -rf "$T"
manifest: skills/data/codex-safe-experiment/SKILL.md
source content
<!-- S0 META-IDENTITY -->

Codex Safe Experiment Skill


LIBRARY-FIRST PROTOCOL (MANDATORY)

Before writing ANY code, you MUST check:

Step 1: Library Catalog

  • Location:
    .claude/library/catalog.json
  • If match >70%: REUSE or ADAPT

Step 2: Patterns Guide

  • Location:
    .claude/docs/inventories/LIBRARY-PATTERNS-GUIDE.md
  • If pattern exists: FOLLOW documented approach

Step 3: Existing Projects

  • Location:
    D:\Projects\*
  • If found: EXTRACT and adapt

Decision Matrix

MatchAction
Library >90%REUSE directly
Library 70-90%ADAPT minimally
Pattern existsFOLLOW pattern
In projectEXTRACT
No matchBUILD (add to library after)

Kanitsal Cerceve (Evidential Frame Activation)

Kaynak dogrulama modu etkin.

Purpose

Use Codex CLI's sandbox mode to experiment with risky changes in complete isolation. Network is disabled, only CWD is accessible, providing safe experimentation.

When to Use This Skill

  • Risky refactoring that might break things
  • Experimental approaches before committing
  • Testing destructive operations safely
  • Trying new libraries or patterns
  • Major architectural changes
  • Security-sensitive experiments

When NOT to Use This Skill

  • Simple, low-risk changes
  • When network access is needed
  • When accessing files outside project
  • Production debugging
  • Quick fixes (use codex-iterative-fix)

Workflow

Phase 1: Experiment Design

  1. Define what you want to try
  2. Identify risk factors
  3. Plan verification steps
  4. Set success criteria

Phase 2: Sandbox Execution

# Full sandbox mode (network disabled, CWD only)
./scripts/multi-model/codex-yolo.sh "Refactor auth system" task-id "." 10 sandbox

# Via delegate.sh
./scripts/multi-model/delegate.sh codex "Try experimental approach" --sandbox

# Direct Codex
bash -lc "codex --full-auto --sandbox true --network disabled exec 'Experiment with X'"

Phase 3: Evaluation

  1. Review what Codex tried
  2. Evaluate if experiment succeeded
  3. Decide: Apply to real codebase?
  4. If yes: Apply changes outside sandbox

Sandbox Isolation Layers

LayerProtection
NetworkDISABLED - no external connections
FilesystemCWD only - no parent access
OS-LevelSeatbelt (macOS) / Docker
CommandsBlocked: rm -rf, sudo, etc.

Success Criteria

  • Experiment ran safely in sandbox
  • Results evaluated
  • Decision made: apply or discard
  • No unintended side effects

Example Usage

Example 1: Major Refactoring

User: "Refactor entire auth system to use new pattern"

Sandbox Process:
1. Clone relevant files to sandbox context
2. Codex implements new pattern
3. Run tests in sandbox
4. Evaluate results
5. If good: Apply to real codebase

Output:
- Experiment: Success
- Tests: 45/47 passing (2 need adjustment)
- Recommendation: Apply with minor fixes

Example 2: Library Migration

User: "Try migrating from moment.js to dayjs"

Sandbox Process:
1. Install dayjs in sandbox
2. Replace moment calls
3. Run tests
4. Compare bundle size

Output:
- Migration: Feasible
- Breaking changes: 3 date format strings
- Bundle reduction: 65KB
- Recommendation: Proceed with migration

Integration with Meta-Loop

META-LOOP IMPLEMENT PHASE:
    |
    +---> High-risk change detected
    |         |
    |         +---> codex-safe-experiment
    |         |         |
    |         |         +---> Sandbox: Try change
    |         |         +---> Evaluate: Success?
    |         |         +---> If yes: Apply for real
    |         |
    |         +---> Continue to TEST phase

Memory Integration

Results stored at:

  • Key:
    multi-model/codex/experiment/{project}/{task_id}
  • Tags: WHO=codex-safe-experiment, WHY=sandboxed-trial
  • Contains: Experiment results, recommendation, diffs

Invocation Pattern

# Via router with experiment keywords
./scripts/multi-model/multi-model-router.sh "Try refactoring X approach"

# Direct sandbox mode
bash -lc "codex --sandbox workspace-write exec 'Experiment with X'"

Guardrails

NEVER:

  • Apply sandbox results without review
  • Skip the evaluation phase
  • Use sandbox for production debugging
  • Trust sandbox results blindly

ALWAYS:

  • Review sandbox diffs before applying
  • Document what was tried
  • Store results for future reference
  • Have rollback plan ready

Decision Framework

Experiment ResultAction
All tests passApply changes
Minor failuresFix then apply
Major failuresDiscard, try different approach
Unexpected behaviorInvestigate before deciding

Related Skills

  • codex-iterative-fix
    : After experiment, for cleanup
  • codex-audit
    : Audit experimental changes
  • testing-quality
    : Generate tests for experiments
  • llm-council
    : Decide on experimental approaches
<!-- S4 SUCCESS CRITERIA -->

Verification Checklist

  • Experiment ran in sandbox
  • Results captured and evaluated
  • Decision documented
  • If applied: Changes verified
  • Memory-MCP updated
<!-- PROMISE -->

[commit|confident] <promise>CODEX_SAFE_EXPERIMENT_COMPLETE</promise>