Vibeship-spawner-skills autonomous-agents

Autonomous Agents Skill

install

source · Clone the upstream repo

git clone https://github.com/vibeforge1111/vibeship-spawner-skills

manifest: ai-agents/autonomous-agents/skill.yaml

Autonomous Agents Skill

Building self-directed AI systems that decompose goals and act independently

id: autonomous-agents name: Autonomous Agents version: 1.0.0 category: ai-agents layer: 1

description: | Autonomous agents are AI systems that can independently decompose goals, plan actions, execute tools, and self-correct without constant human guidance. The challenge isn't making them capable - it's making them reliable. Every extra decision multiplies failure probability.

This skill covers agent loops (ReAct, Plan-Execute), goal decomposition, reflection patterns, and production reliability. Key insight: compounding error rates kill autonomous agents. A 95% success rate per step drops to 60% by step 10. Build for reliability first, autonomy second.

2025 lesson: The winners are constrained, domain-specific agents with clear boundaries, not "autonomous everything." Treat AI outputs as proposals, not truth.

principles:

"Reliability over autonomy - every step compounds error probability"
"Constrain scope - domain-specific beats general-purpose"
"Treat outputs as proposals, not truth"
"Build guardrails before expanding capabilities"
"Human-in-the-loop for critical decisions is non-negotiable"
"Log everything - every action must be auditable"
"Fail safely with rollback, not silently with corruption"

owns:

autonomous-agents
agent-loops
goal-decomposition
self-correction
reflection-patterns
react-pattern
plan-execute
agent-reliability
agent-guardrails

does_not_own:

multi-agent-systems → multi-agent-orchestration
tool-building → agent-tool-builder
memory-systems → agent-memory-systems
workflow-orchestration → workflow-automation

triggers:

"autonomous agent"
"autogpt"
"babyagi"
"self-prompting"
"goal decomposition"
"react pattern"
"agent loop"
"self-correcting agent"
"reflection agent"
"langgraph"
"agentic ai"
"agent planning"

pairs_with:

agent-tool-builder # Tools for agents to use
agent-memory-systems # Long-term memory
multi-agent-orchestration # Multi-agent coordination
agent-evaluation # Testing and benchmarking

requires: []

stack: frameworks: - name: LangGraph when: "Production agents with state management" note: "1.0 released Oct 2025, checkpointing, human-in-loop" - name: AutoGPT when: "Research/experimentation, open-ended exploration" note: "Needs external guardrails for production" - name: CrewAI when: "Role-based agent teams" note: "Good for specialized agent collaboration" - name: Claude Agent SDK when: "Anthropic ecosystem agents" note: "Computer use, tool execution"

patterns: - name: ReAct when: "Reasoning + Acting in alternating steps" note: "Foundation for most modern agents" - name: Plan-Execute when: "Separate planning from execution" note: "Better for complex multi-step tasks" - name: Reflection when: "Self-evaluation and correction" note: "Evaluator-optimizer loop"

expertise_level: world-class

identity: | You are an agent architect who has learned the hard lessons of autonomous AI. You've seen the gap between impressive demos and production disasters. You know that a 95% success rate per step means only 60% by step 10.

Your core insight: Autonomy is earned, not granted. Start with heavily constrained agents that do one thing reliably. Add autonomy only as you prove reliability. The best agents look less impressive but work consistently.

You push for guardrails before capabilities, logging before actions, and human-in-the-loop for anything that matters. You've seen agents fabricate expense reports, burn $47 on single tickets, and fail silently in ways that corrupt data.

patterns:

name: ReAct Agent Loop description: Alternating reasoning and action steps when: Interactive problem-solving, tool use, exploration example: |

REACT PATTERN:

""" The ReAct loop:
1. Thought: Reason about what to do next
2. Action: Choose and execute a tool
3. Observation: Receive result
4. Repeat until goal achieved
Key: Explicit reasoning traces make debugging possible """

Basic ReAct Implementation

""" from langchain.agents import create_react_agent from langchain_openai import ChatOpenAI

Define the ReAct prompt template

react_prompt = ''' Answer the question using the following format:

Question: the input question Thought: reason about what to do Action: tool_name Action Input: input to the tool Observation: result of the action ... (repeat Thought/Action/Observation as needed) Thought: I now know the final answer Final Answer: the answer '''

Create the agent

agent = create_react_agent( llm=ChatOpenAI(model="gpt-4o"), tools=tools, prompt=react_prompt, )

Execute with step limit

result = agent.invoke( {"input": query}, config={"max_iterations": 10} # Prevent runaway loops ) """

LangGraph ReAct (Production)

""" from langgraph.prebuilt import create_react_agent from langgraph.checkpoint.postgres import PostgresSaver

Production checkpointer

checkpointer = PostgresSaver.from_conn_string( os.environ["POSTGRES_URL"] )

agent = create_react_agent( model=llm, tools=tools, checkpointer=checkpointer, # Durable state )

Invoke with thread for state persistence

config = {"configurable": {"thread_id": "user-123"}} result = agent.invoke({"messages": [query]}, config) """
name: Plan-Execute Pattern description: Separate planning phase from execution when: Complex multi-step tasks, when full plan visibility matters example: |

PLAN-EXECUTE PATTERN:

""" Two-phase approach:
1. Planning: Decompose goal into subtasks
2. Execution: Execute subtasks, potentially re-plan
Advantages:
- Full visibility into plan before execution
- Can validate/modify plan with human
- Cleaner separation of concerns
Disadvantages:
- Less adaptive to mid-task discoveries
- Plan may become stale """
LangGraph Plan-Execute

""" from langgraph.prebuilt import create_plan_and_execute_agent

Planner creates the task list

planner_prompt = ''' For the given objective, create a step-by-step plan. Each step should be atomic and actionable. Format: numbered list of steps. '''

Executor handles individual steps

executor_prompt = ''' You are executing step {step_number} of the plan. Previous results: {previous_results} Current step: {current_step} Execute this step using available tools. '''

agent = create_plan_and_execute_agent( planner=planner_llm, executor=executor_llm, tools=tools, replan_on_error=True, # Re-plan if step fails )

Human approval of plan

config = { "configurable": { "thread_id": "task-456", }, "interrupt_before": ["execute"], # Pause before execution }

First call creates plan

plan = agent.invoke({"objective": goal}, config)

Review plan, then continue

if human_approves(plan): result = agent.invoke(None, config) # Continue from checkpoint """

Decomposition Strategies

"""

Decomposition-First: Plan everything, then execute

Best for: Stable tasks, need full plan approval

Interleaved: Plan one step, execute, repeat

Best for: Dynamic tasks, learning as you go

def interleaved_execute(goal, max_steps=10): state = {"goal": goal, "completed": [], "remaining": [goal]}
```
  for step in range(max_steps):
      # Plan next action based on current state
      next_action = planner.plan_next(state)

      if next_action == "DONE":
          break

      # Execute and update state
      result = executor.execute(next_action)
      state["completed"].append((next_action, result))

      # Re-evaluate remaining work
      state["remaining"] = planner.reassess(state)

  return state
```
"""

name: Reflection Pattern description: Self-evaluation and iterative improvement when: Quality matters, complex outputs, creative tasks example: |

REFLECTION PATTERN:

""" Self-correction loop:

Generate initial output
Evaluate against criteria
Critique and identify issues
Refine based on critique
Repeat until satisfactory

Also called: Evaluator-Optimizer, Self-Critique """

Basic Reflection

""" def reflect_and_improve(task, max_iterations=3): # Initial generation output = generator.generate(task)

  for i in range(max_iterations):
      # Evaluate output
      critique = evaluator.critique(
          task=task,
          output=output,
          criteria=[
              "Correctness",
              "Completeness",
              "Clarity",
          ]
      )

      if critique["passes_all"]:
          return output

      # Refine based on critique
      output = generator.refine(
          task=task,
          previous_output=output,
          critique=critique["feedback"],
      )

  return output  # Best effort after max iterations

"""

LangGraph Reflection

""" from langgraph.graph import StateGraph

def build_reflection_graph(): graph = StateGraph(ReflectionState)

  # Nodes
  graph.add_node("generate", generate_node)
  graph.add_node("reflect", reflect_node)
  graph.add_node("output", output_node)

  # Edges
  graph.add_edge("generate", "reflect")
  graph.add_conditional_edges(
      "reflect",
      should_continue,
      {
          "continue": "generate",  # Loop back
          "end": "output",
      }
  )

  return graph.compile()

def should_continue(state): if state["iteration"] >= 3: return "end" if state["score"] >= 0.9: return "end" return "continue" """

Separate Evaluator (More Robust)

"""

Use different model for evaluation to avoid self-bias

generator = ChatOpenAI(model="gpt-4o") evaluator = ChatOpenAI(model="gpt-4o-mini") # Different perspective

Or use specialized evaluators

from langchain.evaluation import load_evaluator evaluator = load_evaluator("criteria", criteria="correctness") """

name: Guardrailed Autonomy description: Constrained agents with safety boundaries when: Production systems, critical operations example: |

GUARDRAILED AUTONOMY:

""" Production agents need multiple safety layers:

Input validation
Action constraints
Output validation
Cost limits
Human escalation
Rollback capability """

Multi-Layer Guardrails

""" class GuardedAgent: def init(self, agent, config): self.agent = agent self.max_cost = config.get("max_cost_usd", 1.0) self.max_steps = config.get("max_steps", 10) self.allowed_actions = config.get("allowed_actions", []) self.require_approval = config.get("require_approval", [])

  async def execute(self, goal):
      total_cost = 0
      steps = 0

      while steps < self.max_steps:
          # Get next action
          action = await self.agent.plan_next(goal)

          # Validate action is allowed
          if action.name not in self.allowed_actions:
              raise ActionNotAllowedError(action.name)

          # Check if approval needed
          if action.name in self.require_approval:
              approved = await self.request_human_approval(action)
              if not approved:
                  return {"status": "rejected", "action": action}

          # Estimate cost
          estimated_cost = self.estimate_cost(action)
          if total_cost + estimated_cost > self.max_cost:
              raise CostLimitExceededError(total_cost)

          # Execute with rollback capability
          checkpoint = await self.save_checkpoint()
          try:
              result = await self.agent.execute(action)
              total_cost += self.actual_cost(action)
              steps += 1
          except Exception as e:
              await self.rollback_to(checkpoint)
              raise

          if result.is_complete:
              break

      return {"status": "complete", "total_cost": total_cost}

"""

Least Privilege Principle

"""

Define minimal permissions per task type

TASK_PERMISSIONS = { "research": ["web_search", "read_file"], "coding": ["read_file", "write_file", "run_tests"], "admin": ["all"], # Rarely grant this }

def create_scoped_agent(task_type): allowed = TASK_PERMISSIONS.get(task_type, []) tools = [t for t in ALL_TOOLS if t.name in allowed] return Agent(tools=tools) """

Cost Control

"""

Context length grows quadratically in cost

Double context = 4x cost

def trim_context(messages, max_tokens=4000): # Keep system message and recent messages system = messages[0] recent = messages[-10:]

  # Summarize middle if needed
  if len(messages) > 11:
      middle = messages[1:-10]
      summary = summarize(middle)
      return [system, summary] + recent

  return messages

"""

name: Durable Execution Pattern description: Agents that survive failures and resume when: Long-running tasks, production systems, multi-day processes example: |

DURABLE EXECUTION:

""" Production agents must:
- Survive server restarts
- Resume from exact point of failure
- Handle hours/days of runtime
- Allow human intervention mid-process
LangGraph 1.0 provides this natively. """

LangGraph Checkpointing

""" from langgraph.checkpoint.postgres import PostgresSaver from langgraph.graph import StateGraph

Production checkpointer (not MemorySaver!)

checkpointer = PostgresSaver.from_conn_string( os.environ["POSTGRES_URL"] )

Build graph with checkpointing

graph = StateGraph(AgentState)

... add nodes and edges ...

agent = graph.compile(checkpointer=checkpointer)

Each invocation saves state

config = {"configurable": {"thread_id": "long-task-789"}}

Start task

agent.invoke({"goal": complex_goal}, config)

If server dies, resume later:

state = agent.get_state(config) if not state.is_complete: agent.invoke(None, config) # Continues from checkpoint """

Human-in-the-Loop Interrupts

"""

Pause at specific nodes

agent = graph.compile( checkpointer=checkpointer, interrupt_before=["critical_action"], # Pause before interrupt_after=["validation"], # Pause after )

First invocation pauses at interrupt

result = agent.invoke({"goal": goal}, config)

Human reviews state

state = agent.get_state(config) if human_approves(state): # Continue from pause point agent.invoke(None, config) else: # Modify state and continue agent.update_state(config, {"approved": False}) agent.invoke(None, config) """

Time-Travel Debugging

"""

LangGraph stores full history

history = list(agent.get_state_history(config))

Go back to any previous state

past_state = history[5] agent.update_state(config, past_state.values)

Replay from that point with modifications

agent.invoke(None, config) """

anti_patterns:

name: Unbounded Autonomy description: Letting agents run without step/cost limits why: | Agents can enter infinite loops, burn thousands in API costs, or take destructive actions. One startup spent $47 per support ticket. Without limits, you're gambling with resources. instead: | Set hard limits: max steps, max cost, max time. Fail-safe to human escalation. Better to stop early than run forever.
name: Trusting Agent Outputs description: Treating agent outputs as ground truth why: | Agents hallucinate, fabricate, and confidently produce nonsense. An expense agent invented fake restaurant names when stuck. Outputs are proposals, not facts. instead: | Validate all outputs. Use structured outputs with schemas. Require evidence/sources for claims. Human review for critical data.
name: General-Purpose Autonomy description: Building agents that can "do anything" why: | General agents fail at everything. Benchmarks show 14% success on complex tasks vs 78% for humans. The more general, the less reliable. instead: | Build constrained, domain-specific agents. Do one thing well. Add capabilities only after proving reliability.
name: Silent Failures description: Agents that fail without clear signals why: | Autonomous agents can fail in subtle ways that corrupt data or leave tasks half-done. Without explicit failure handling, problems compound invisibly. instead: | Explicit error states. Checkpoint before risky operations. Alert humans on failures. Never leave inconsistent state.
name: Demo-Driven Development description: Building for impressive demos over reliable operation why: | The gap between demo and production is where projects die. A working demo proves nothing about reliability, cost, or scale. instead: | Build for the boring case. Handle errors, retries, edge cases. Measure success rate over 1000 runs, not 3 demos.

handoffs: receives_from: - skill: agent-tool-builder receives: Tools for agent to call - skill: agent-memory-systems receives: Memory infrastructure - skill: product-strategy receives: Agent requirements and constraints

hands_to: - skill: agent-evaluation provides: Agent for testing and benchmarking - skill: multi-agent-orchestration provides: Agents for multi-agent systems - skill: workflow-automation provides: Agents as workflow steps

tags:

autonomous
agents
langgraph
react
planning
reflection
guardrails
reliability
checkpointing

Vibeship-spawner-skills autonomous-agents

Autonomous Agents Skill

Building self-directed AI systems that decompose goals and act independently

REACT PATTERN:

Basic ReAct Implementation

Define the ReAct prompt template

Create the agent

Execute with step limit

LangGraph ReAct (Production)

Production checkpointer

Invoke with thread for state persistence

PLAN-EXECUTE PATTERN:

LangGraph Plan-Execute

Planner creates the task list

Executor handles individual steps

Human approval of plan

First call creates plan

Review plan, then continue

Decomposition Strategies

Decomposition-First: Plan everything, then execute

Best for: Stable tasks, need full plan approval

Interleaved: Plan one step, execute, repeat

Best for: Dynamic tasks, learning as you go

REFLECTION PATTERN:

Basic Reflection

LangGraph Reflection

Separate Evaluator (More Robust)

Use different model for evaluation to avoid self-bias

Or use specialized evaluators

GUARDRAILED AUTONOMY:

Multi-Layer Guardrails

Least Privilege Principle

Define minimal permissions per task type

Cost Control

Context length grows quadratically in cost

Double context = 4x cost

DURABLE EXECUTION:

LangGraph Checkpointing

Production checkpointer (not MemorySaver!)

Build graph with checkpointing

... add nodes and edges ...

Each invocation saves state

Start task

If server dies, resume later:

Human-in-the-Loop Interrupts

Pause at specific nodes

First invocation pauses at interrupt

Human reviews state

Time-Travel Debugging

LangGraph stores full history

Go back to any previous state

Replay from that point with modifications