Awesome-Agent-Skills-for-Empirical-Research autonomous-agents-papers-guide

Daily-updated collection of autonomous AI agent papers

install

source · Clone the upstream repo

git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/43-wentorai-research-plugins/skills/domains/ai-ml/autonomous-agents-papers-guide" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-autonomous-agents && rm -rf "$T"

manifest: skills/43-wentorai-research-plugins/skills/domains/ai-ml/autonomous-agents-papers-guide/SKILL.md

source content

Autonomous Agents Papers Guide

Overview

A daily-updated collection of research papers on autonomous AI agents — systems that use LLMs for planning, reasoning, tool use, and multi-step task execution. Covers the full agent stack from foundational prompting techniques (ReAct, Chain-of-Thought) to multi-agent systems, memory architectures, and real-world deployments. Organized chronologically with category tags for easy navigation.

Agent Taxonomy

Autonomous Agents
├── Planning & Reasoning
│   ├── Chain-of-Thought (CoT, ToT, GoT)
│   ├── ReAct (Reasoning + Acting)
│   ├── Reflexion (Self-reflection)
│   └── LATS (Language Agent Tree Search)
├── Tool Use & Actions
│   ├── Function calling
│   ├── Code execution
│   ├── Web browsing
│   └── API interaction
├── Memory Systems
│   ├── Short-term (context window)
│   ├── Long-term (vector stores)
│   ├── Episodic (experience replay)
│   └── Procedural (learned strategies)
├── Multi-Agent Systems
│   ├── Debate/discussion (ChatDev, MetaGPT)
│   ├── Hierarchical (manager/worker)
│   ├── Collaborative (shared goals)
│   └── Competitive (adversarial)
└── Applications
    ├── Software engineering (SWE-agent, Devin)
    ├── Scientific research (AI Scientist)
    ├── Web automation (WebArena)
    └── Game playing (Voyager)

Landmark Papers

Paper	Year	Key Contribution
ReAct	2023	Interleaving reasoning and acting
Toolformer	2023	Self-taught tool use
Voyager	2023	Lifelong learning agent in Minecraft
AutoGPT	2023	Autonomous goal-directed agent
MetaGPT	2023	Multi-agent software company
Reflexion	2023	Verbal self-reflection for learning
SWE-agent	2024	Autonomous software engineering
AI Scientist	2024	Autonomous research paper generation
Claude Computer Use	2024	GUI agent via screenshots
OpenHands	2024	Open platform for AI agents

Paper Tracking

import arxiv
from datetime import datetime, timedelta

def find_agent_papers(days=7, max_results=30):
    """Find recent autonomous agent papers."""
    queries = [
        "abs:autonomous agent AND abs:large language model",
        "abs:LLM agent AND (abs:planning OR abs:tool use)",
        "abs:multi-agent AND abs:LLM",
    ]

    seen = set()
    papers = []

    for query in queries:
        search = arxiv.Search(
            query=query,
            max_results=max_results,
            sort_by=arxiv.SortCriterion.SubmittedDate,
        )
        cutoff = datetime.now() - timedelta(days=days)
        for r in search.results():
            if (r.entry_id not in seen and
                r.published.replace(tzinfo=None) > cutoff):
                seen.add(r.entry_id)
                papers.append({
                    "title": r.title,
                    "url": r.entry_id,
                    "date": r.published.strftime("%Y-%m-%d"),
                    "categories": r.categories,
                })

    papers.sort(key=lambda x: x["date"], reverse=True)
    return papers

for p in find_agent_papers(days=14):
    print(f"[{p['date']}] {p['title']}")

Agent Benchmarks

benchmarks = {
    "SWE-bench": {
        "task": "Resolve real GitHub issues",
        "metric": "% resolved",
        "top_score": "49% (Claude 3.5 + SWE-agent)",
    },
    "WebArena": {
        "task": "Complete web tasks in realistic sites",
        "metric": "Task success rate",
        "top_score": "35.8%",
    },
    "GAIA": {
        "task": "General AI assistant tasks",
        "metric": "Accuracy across levels",
        "top_score": "Level 1: 75%, Level 3: 30%",
    },
    "AgentBench": {
        "task": "8 diverse agent environments",
        "metric": "Overall score",
    },
    "ToolBench": {
        "task": "API tool selection and chaining",
        "metric": "Pass rate",
    },
}

for name, info in benchmarks.items():
    print(f"\n{name}: {info['task']}")
    print(f"  Metric: {info['metric']}")
    if "top_score" in info:
        print(f"  SOTA: {info['top_score']}")

Reading Roadmap

### Foundations
1. "Chain-of-Thought Prompting" (Wei et al., 2022)
2. "ReAct: Synergizing Reasoning and Acting" (Yao et al., 2023)
3. "Toolformer" (Schick et al., 2023)

### Planning & Memory
4. "Tree of Thoughts" (Yao et al., 2023)
5. "Reflexion" (Shinn et al., 2023)
6. "Generative Agents" (Park et al., 2023)

### Multi-Agent
7. "MetaGPT" (Hong et al., 2023)
8. "AutoGen" (Wu et al., 2023)
9. "ChatDev" (Qian et al., 2023)

### Applications
10. "SWE-agent" (Yang et al., 2024)
11. "The AI Scientist" (Lu et al., 2024)

Use Cases

Literature survey: Track the fast-moving agent research field
System design: Learn from agent architecture patterns
Benchmark comparison: Compare agent frameworks
Research direction: Identify open problems in agent AI
Course material: Teach LLM-based agent systems