Awesome-Agent-Skills-for-Empirical-Research ai-agent-papers-guide
Curated 2024-2026 AI agent research papers collection
install
source · Clone the upstream repo
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/43-wentorai-research-plugins/skills/domains/ai-ml/ai-agent-papers-guide" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-ai-agent-papers-g && rm -rf "$T"
manifest:
skills/43-wentorai-research-plugins/skills/domains/ai-ml/ai-agent-papers-guide/SKILL.mdsource content
AI Agent Papers Guide (2024-2026)
Overview
A focused collection of AI agent research papers from 2024-2026, tracking the latest developments in LLM-based agent systems. Unlike broader collections, this focuses on recent breakthroughs — new architectures, benchmarks, multi-agent coordination, and real-world applications. Updated frequently as the field evolves rapidly.
Paper Categories
Recent AI Agent Research ├── Agent Architectures │ ├── Planning (o1-style reasoning, search-augmented) │ ├── Memory (long-term, episodic, working) │ └── Tool use (function calling, code execution) ├── Multi-Agent Systems │ ├── Collaboration (task decomposition, debate) │ ├── Competition (red team, adversarial) │ └── Emergence (self-organization, culture) ├── Evaluation │ ├── Benchmarks (SWE-bench, WebArena, GAIA) │ ├── Safety (jailbreak, misuse, alignment) │ └── Reliability (error recovery, hallucination) ├── Applications │ ├── Software engineering (coding agents) │ ├── Scientific research (lab automation) │ ├── Web automation (browsing, form-filling) │ └── Enterprise (workflow, data analysis) └── Infrastructure ├── Frameworks (LangGraph, CrewAI, AutoGen) ├── Protocols (MCP, A2A, tool standards) └── Deployment (scaling, monitoring, cost)
Highlighted Papers (2024-2025)
| Paper | Venue | Key Contribution |
|---|---|---|
| SWE-agent | ICLR 2025 | Agent interface design for SE |
| OpenHands | 2024 | Open platform for coding agents |
| AgentBench | ICLR 2024 | Multi-environment agent benchmark |
| GAIA | ICLR 2024 | General AI assistant benchmark |
| Voyager | NeurIPS 2024 | Lifelong learning in Minecraft |
| OS-Copilot | 2024 | Self-improving computer agent |
| AutoGen | 2024 | Multi-agent conversation framework |
| Agent-FLAN | ACL 2024 | Agent fine-tuning methodology |
Tracking New Papers
import arxiv from datetime import datetime, timedelta def find_recent_agent_papers(days=14): """Find cutting-edge agent papers.""" queries = [ "ti:agent AND (ti:LLM OR ti:language model)", "abs:autonomous agent AND abs:tool use AND abs:2024", "ti:multi-agent AND abs:large language", "abs:coding agent OR abs:software agent", ] seen = set() papers = [] for q in queries: search = arxiv.Search( query=q, max_results=15, sort_by=arxiv.SortCriterion.SubmittedDate, ) for r in search.results(): if r.entry_id not in seen: seen.add(r.entry_id) papers.append({ "title": r.title, "date": r.published.strftime("%Y-%m-%d"), "url": r.entry_id, }) papers.sort(key=lambda x: x["date"], reverse=True) for p in papers[:20]: print(f"[{p['date']}] {p['title']}") print(f" {p['url']}") find_recent_agent_papers()
Framework Comparison
frameworks = { "LangGraph": { "paradigm": "Graph-based workflows", "persistence": "Built-in checkpointing", "multi_agent": "Yes", "language": "Python/JS", }, "CrewAI": { "paradigm": "Role-based agents", "persistence": "Memory module", "multi_agent": "Yes (crew)", "language": "Python", }, "AutoGen": { "paradigm": "Conversational agents", "persistence": "Chat history", "multi_agent": "Yes (group chat)", "language": "Python/.NET", }, "OpenHands": { "paradigm": "Computer use agent", "persistence": "Workspace state", "multi_agent": "No", "language": "Python", }, } for name, info in frameworks.items(): print(f"\n{name}:") for k, v in info.items(): print(f" {k}: {v}")
Use Cases
- Literature tracking: Stay current on agent research
- Framework selection: Compare agent development tools
- Research planning: Identify open problems and trends
- Course material: Teach cutting-edge agent systems
- Benchmark tracking: Compare agent capabilities