Oh-my-codex autopilot

Full autonomous execution from idea to working code

install
source · Clone the upstream repo
git clone https://github.com/Yeachan-Heo/oh-my-codex
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/Yeachan-Heo/oh-my-codex "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/autopilot" ~/.claude/skills/yeachan-heo-oh-my-codex-autopilot && rm -rf "$T"
manifest: skills/autopilot/SKILL.md
source content
<Purpose> Autopilot takes a brief product idea and autonomously handles the full lifecycle: requirements analysis, technical design, planning, parallel implementation, QA cycling, and multi-perspective validation. It produces working, verified code from a 2-3 line description. </Purpose>

<Use_When>

  • User wants end-to-end autonomous execution from an idea to working code
  • User says "autopilot", "auto pilot", "autonomous", "build me", "create me", "make me", "full auto", "handle it all", or "I want a/an..."
  • Task requires multiple phases: planning, coding, testing, and validation
  • User wants hands-off execution and is willing to let the system run to completion </Use_When>

<Do_Not_Use_When>

  • User wants to explore options or brainstorm -- use
    plan
    skill instead
  • User says "just explain", "draft only", or "what would you suggest" -- respond conversationally
  • User wants a single focused code change -- use
    ralph
    or delegate to an executor agent
  • User wants to review or critique an existing plan -- use
    plan --review
  • Task is a quick fix or small bug -- use direct executor delegation </Do_Not_Use_When>

<Why_This_Exists> Most non-trivial software tasks require coordinated phases: understanding requirements, designing a solution, implementing in parallel, testing, and validating quality. Autopilot orchestrates all of these phases automatically so the user can describe what they want and receive working code without managing each step. </Why_This_Exists>

<Execution_Policy>

  • Each phase must complete before the next begins
  • Parallel execution is used within phases where possible (Phase 2 and Phase 4)
  • QA cycles repeat up to 5 times; if the same error persists 3 times, stop and report the fundamental issue
  • Validation requires approval from all reviewers; rejected items get fixed and re-validated
  • Cancel with
    /cancel
    at any time; progress is preserved for resume
  • If a deep-interview spec exists, use it as high-clarity phase input instead of re-expanding from scratch
  • If input is too vague for reliable expansion, offer/trigger
    $deep-interview
    first
  • Do not enter expansion/planning/execution-heavy phases until pre-context grounding exists; if fast execution is forced, proceed only with explicit risk notes
  • Default to concise, evidence-dense progress and completion reporting unless the user or risk level requires more detail
  • Treat newer user task updates as local overrides for the active workflow branch while preserving earlier non-conflicting constraints
  • If correctness depends on additional inspection, retrieval, execution, or verification, keep using the relevant tools until the workflow is grounded
  • Continue through clear, low-risk, reversible next steps automatically; ask only when the next step is materially branching, destructive, or preference-dependent </Execution_Policy>
<Steps> 0. **Pre-context Intake (required before Phase 0 starts)**: - Derive a task slug from the request. - Load the latest relevant snapshot from `.omx/context/{slug}-*.md` when available. - If no snapshot exists, create `.omx/context/{slug}-{timestamp}.md` (UTC `YYYYMMDDTHHMMSSZ`) with: - Task statement - Desired outcome - Known facts/evidence - Constraints - Unknowns/open questions - Likely codebase touchpoints - If ambiguity remains high, run `explore` first for brownfield facts, then run `$deep-interview --quick <task>` before proceeding. - Carry the snapshot path into autopilot artifacts/state so all phases share grounded context.
  1. Phase 0 - Expansion: Turn the user's idea into a detailed spec

    • If
      .omx/specs/deep-interview-*.md
      exists for this task: reuse it and skip redundant expansion work
    • If prompt is highly vague: route to
      $deep-interview
      for Socratic ambiguity-gated clarification
    • Analyst (THOROUGH tier): Extract requirements
    • Architect (THOROUGH tier): Create technical specification
    • Output:
      .omx/plans/autopilot-spec.md
  2. Phase 1 - Planning: Create an implementation plan from the spec

    • Architect (THOROUGH tier): Create plan (direct mode, no interview)
    • Critic (THOROUGH tier): Validate plan
    • Output:
      .omx/plans/autopilot-impl.md
  3. Phase 2 - Execution: Implement the plan using Ralph + Ultrawork

    • LOW-tier executor/search roles: Simple tasks
    • STANDARD-tier executor roles: Standard tasks
    • THOROUGH-tier executor/architect roles: Complex tasks
    • Run independent tasks in parallel
  4. Phase 3 - QA: Cycle until all tests pass (UltraQA mode)

    • Build, lint, test, fix failures
    • Repeat up to 5 cycles
    • Stop early if the same error repeats 3 times (indicates a fundamental issue)
  5. Phase 4 - Validation: Multi-perspective review in parallel

    • Architect: Functional completeness
    • Security-reviewer: Vulnerability check
    • Code-reviewer: Quality review
    • All must approve; fix and re-validate on rejection
  6. Phase 5 - Cleanup: Clear all mode state via OMX MCP tools on successful completion

    • state_clear({mode: "autopilot"})
    • state_clear({mode: "ralph"})
    • state_clear({mode: "ultrawork"})
    • state_clear({mode: "ultraqa"})
    • Or run
      /cancel
      for clean exit </Steps>

<Tool_Usage>

  • Before first MCP tool use, call
    ToolSearch("mcp")
    to discover deferred MCP tools
  • Use
    ask_codex
    with
    agent_role: "architect"
    for Phase 4 architecture validation
  • Use
    ask_codex
    with
    agent_role: "security-reviewer"
    for Phase 4 security review
  • Use
    ask_codex
    with
    agent_role: "code-reviewer"
    for Phase 4 quality review
  • Agents form their own analysis first, then consult Codex for cross-validation
  • If ToolSearch finds no MCP tools or Codex is unavailable, proceed without it -- never block on external tools </Tool_Usage>

State Management

Use

omx_state
MCP tools for autopilot lifecycle state.

  • On start:
    state_write({mode: "autopilot", active: true, current_phase: "expansion", started_at: "<now>", state: {context_snapshot_path: "<snapshot-path>"}})
  • On phase transitions:
    state_write({mode: "autopilot", current_phase: "planning"})
    state_write({mode: "autopilot", current_phase: "execution"})
    state_write({mode: "autopilot", current_phase: "qa"})
    state_write({mode: "autopilot", current_phase: "validation"})
  • On completion:
    state_write({mode: "autopilot", active: false, current_phase: "complete", completed_at: "<now>"})
  • On cancellation/cleanup: run
    $cancel
    (which should call
    state_clear(mode="autopilot")
    )

Scenario Examples

Good: The user says

continue
after the workflow already has a clear next step. Continue the current branch of work instead of restarting or re-asking the same question.

Good: The user changes only the output shape or downstream delivery step (for example

make a PR
). Preserve earlier non-conflicting workflow constraints and apply the update locally.

Bad: The user says

continue
, and the workflow restarts discovery or stops before the missing verification/evidence is gathered.

<Examples> <Good> User: "autopilot A REST API for a bookstore inventory with CRUD operations using TypeScript" Why good: Specific domain (bookstore), clear features (CRUD), technology constraint (TypeScript). Autopilot has enough context to expand into a full spec. </Good> <Good> User: "build me a CLI tool that tracks daily habits with streak counting" Why good: Clear product concept with a specific feature. The "build me" trigger activates autopilot. </Good> <Bad> User: "fix the bug in the login page" Why bad: This is a single focused fix, not a multi-phase project. Use direct executor delegation or ralph instead. </Bad> <Bad> User: "what are some good approaches for adding caching?" Why bad: This is an exploration/brainstorming request. Respond conversationally or use the plan skill. </Bad> </Examples>

<Escalation_And_Stop_Conditions>

  • Stop and report when the same QA error persists across 3 cycles (fundamental issue requiring human input)
  • Stop and report when validation keeps failing after 3 re-validation rounds
  • Stop when the user says "stop", "cancel", or "abort"
  • If requirements were too vague and expansion produces an unclear spec, pause and redirect to
    $deep-interview
    before proceeding </Escalation_And_Stop_Conditions>

<Final_Checklist>

  • All 5 phases completed (Expansion, Planning, Execution, QA, Validation)
  • All validators approved in Phase 4
  • Tests pass (verified with fresh test run output)
  • Build succeeds (verified with fresh build output)
  • State files cleaned up
  • User informed of completion with summary of what was built </Final_Checklist>
<Advanced> ## Configuration

Optional settings in

~/.codex/config.toml
:

[omx.autopilot]
maxIterations = 10
maxQaCycles = 5
maxValidationRounds = 3
pauseAfterExpansion = false
pauseAfterPlanning = false
skipQa = false
skipValidation = false

Resume

If autopilot was cancelled or failed, run

/autopilot
again to resume from where it stopped.

Recommended Clarity Pipeline

For ambiguous requests, prefer:

deep-interview -> ralplan -> autopilot
  • deep-interview
    : ambiguity-gated Socratic requirements
  • ralplan
    : consensus planning (planner/architect/critic)
  • autopilot
    : execution + QA + validation

Best Practices for Input

  1. Be specific about the domain -- "bookstore" not "store"
  2. Mention key features -- "with CRUD", "with authentication"
  3. Specify constraints -- "using TypeScript", "with PostgreSQL"
  4. Let it run -- avoid interrupting unless truly needed

Pipeline Orchestrator (v0.8+)

Autopilot can be driven by the configurable pipeline orchestrator (

src/pipeline/
), which sequences stages through a uniform
PipelineStage
interface:

RALPLAN (consensus planning) -> team-exec (Codex CLI workers) -> ralph-verify (architect verification)

Pipeline configuration options:

[omx.autopilot.pipeline]
maxRalphIterations = 10    # Ralph verification iteration ceiling
workerCount = 2            # Number of Codex CLI team workers
agentType = "executor"     # Agent type for team workers

The pipeline persists state via

pipeline-state.json
and supports resume from the last incomplete stage. See
src/pipeline/orchestrator.ts
for the full API.

Troubleshooting

Stuck in a phase? Check TODO list for blocked tasks, run

state_read({mode: "autopilot"})
, or cancel and resume.

QA cycles exhausted? The same error 3 times indicates a fundamental issue. Review the error pattern; manual intervention may be needed.

Validation keeps failing? Review the specific issues. Requirements may have been too vague -- cancel and provide more detail. </Advanced>