install
source · Clone the upstream repo
git clone https://github.com/Aradotso/trending-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/Aradotso/trending-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/agent-skills-context-engineering" ~/.claude/skills/aradotso-trending-skills-agent-skills-context-engineering && rm -rf "$T"
manifest:
skills/agent-skills-context-engineering/SKILL.mdsource content
--- name: agent-skills-context-engineering description: Comprehensive collection of Agent Skills for context engineering, multi-agent architectures, memory systems, and production agent systems using Claude Code, Cursor, and other AI platforms. triggers: - "context engineering for agents" - "build multi-agent system" - "install agent skills claude code" - "context window management" - "agent memory architecture" - "optimize agent context" - "implement BDI mental states" - "design agent evaluation framework" --- # Agent Skills for Context Engineering > Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection. A comprehensive, open collection of Agent Skills focused on context engineering — the discipline of curating what enters an LLM's context window to maximize agent effectiveness. Covers foundational context mechanics, multi-agent architectures, memory systems, tool design, evaluation, and cognitive modeling. ## What This Project Does Context engineering is about managing the **holistic set of tokens** that enter a model's attention budget: system prompts, tool definitions, retrieved documents, message history, and tool outputs. This repository provides structured, installable skills that teach AI coding agents these principles across any platform. Key problems addressed: - **Lost-in-the-middle**: Models degrade when relevant content is buried in long contexts - **Context poisoning/distraction**: Irrelevant tokens degrade reasoning quality - **Attention scarcity**: More tokens ≠ better outcomes; fewer high-signal tokens do - **Multi-agent coordination**: How agents hand off context without loss ## Installation ### Claude Code (Plugin Marketplace) ```bash # Register the marketplace /plugin marketplace add muratcankoylan/Agent-Skills-for-Context-Engineering # Install individual plugin bundles /plugin install context-engineering-fundamentals@context-engineering-marketplace /plugin install agent-architecture@context-engineering-marketplace /plugin install agent-evaluation@context-engineering-marketplace /plugin install agent-development@context-engineering-marketplace /plugin install cognitive-architecture@context-engineering-marketplace
Cursor
Listed on Cursor Plugin Directory. Install via the Cursor plugin panel or reference
.plugin/plugin.json directly.
Manual / Custom Agent
Clone and reference skill files directly:
git clone https://github.com/muratcankoylan/Agent-Skills-for-Context-Engineering.git
Load skill content from
skills/<skill-name>/SKILL.md into your agent's system prompt or context.
Plugin Bundles
| Plugin | Skills Included |
|---|---|
| context-fundamentals, context-degradation, context-compression, context-optimization |
| multi-agent-patterns, memory-systems, tool-design, filesystem-context, hosted-agents |
| evaluation, advanced-evaluation |
| project-development |
| bdi-mental-states |
Repository Structure
Agent-Skills-for-Context-Engineering/ ├── .plugin/ │ └── plugin.json # Open Plugins manifest ├── skills/ │ ├── context-fundamentals/ # Context anatomy, token budgets │ ├── context-degradation/ # Failure modes and diagnostics │ ├── context-compression/ # Compression and summarization │ ├── context-optimization/ # Caching, masking, compaction │ ├── multi-agent-patterns/ # Orchestrator, peer, hierarchical │ ├── memory-systems/ # Short/long-term, graph memory │ ├── tool-design/ # Effective tool construction │ ├── filesystem-context/ # File-based context offloading │ ├── hosted-agents/ # Sandboxed background agents │ ├── evaluation/ # Agent evaluation frameworks │ ├── advanced-evaluation/ # LLM-as-a-Judge techniques │ ├── project-development/ # LLM project methodology │ └── bdi-mental-states/ # BDI cognitive architecture └── examples/ ├── digital-brain-skill/ # Personal OS for founders ├── x-to-book-system/ # Multi-agent X→book pipeline ├── llm-as-judge-skills/ # TypeScript evaluation tools └── book-sft-pipeline/ # Style transfer fine-tuning
Core Concepts
Context Window Anatomy
# The five components competing for attention budget context = { "system_prompt": "...", # Role, instructions, constraints "tool_definitions": [...], # Available tools and schemas "retrieved_documents": [...], # RAG results, memory lookups "message_history": [...], # Conversation turns "tool_outputs": [...], # Results from tool calls } # Token budget allocation example TOTAL_BUDGET = 128_000 # tokens budget = { "system_prompt": 2_000, # 1.6% — keep tight "tool_definitions": 5_000, # 3.9% — prune unused tools "retrieved_documents":40_000, # 31% — highest ROI "message_history": 70_000, # 55% — compress aggressively "tool_outputs": 11_000, # 8.5% — offload to filesystem }
Context Degradation Patterns
# Pattern 1: Lost-in-the-middle # Critical information placed in the center of a long context # degrades recall significantly. Always place key info at edges. def order_context_for_attention(documents: list[str], query: str) -> list[str]: """Place most relevant documents first and last.""" scored = rank_by_relevance(documents, query) n = len(scored) ordered = [None] * n # High relevance → positions 0 and -1 for i, doc in enumerate(scored): if i % 2 == 0: ordered[i // 2] = doc # fill from front else: ordered[n - 1 - i // 2] = doc # fill from back return ordered # Pattern 2: Context poisoning # Contradictory or stale information causes unpredictable behavior def validate_context_consistency(facts: list[dict]) -> list[dict]: """Remove contradicting or outdated facts before injection.""" seen_keys = {} clean = [] for fact in sorted(facts, key=lambda f: f["timestamp"], reverse=True): key = fact["subject"] + fact["predicate"] if key not in seen_keys: seen_keys[key] = True clean.append(fact) return clean
Context Compression
import anthropic client = anthropic.Anthropic() # uses ANTHROPIC_API_KEY env var def compress_conversation( messages: list[dict], keep_last_n: int = 10, model: str = "claude-opus-4-5", ) -> list[dict]: """ Compress long conversation history into a summary + recent tail. Preserves decisions, outcomes, and key entities. """ if len(messages) <= keep_last_n: return messages to_compress = messages[:-keep_last_n] recent = messages[-keep_last_n:] summary_prompt = f"""Summarize this conversation segment. Preserve: decisions made, key entities, open questions, errors encountered. Discard: pleasantries, repetition, superseded plans. Conversation: {format_messages(to_compress)} """ response = client.messages.create( model=model, max_tokens=1024, messages=[{"role": "user", "content": summary_prompt}], ) summary_message = { "role": "assistant", "content": f"[COMPRESSED HISTORY]\n{response.content[0].text}", } return [summary_message] + recent def format_messages(messages: list[dict]) -> str: return "\n".join( f"{m['role'].upper()}: {m['content']}" for m in messages )
Multi-Agent Patterns
# Orchestrator pattern — one agent routes, subagents execute class OrchestratorAgent: def __init__(self, subagents: dict[str, "SubAgent"]): self.subagents = subagents self.client = anthropic.Anthropic() def route(self, task: str) -> str: """Determine which subagent handles this task.""" routing_prompt = f"""Given this task, which specialist should handle it? Specialists: {list(self.subagents.keys())} Task: {task} Reply with only the specialist name.""" response = self.client.messages.create( model="claude-opus-4-5", max_tokens=50, system="You are a routing agent. Reply with only the specialist name.", messages=[{"role": "user", "content": routing_prompt}], ) return response.content[0].text.strip() def execute(self, task: str) -> str: specialist = self.route(task) if specialist not in self.subagents: raise ValueError(f"Unknown specialist: {specialist}") # Pass minimal context — only what the subagent needs return self.subagents[specialist].run(task) # Context handoff — pass structured summaries, not raw history def create_handoff_context(completed_work: dict) -> str: """Minimal handoff context between agents.""" return f"""AGENT HANDOFF Task: {completed_work['task']} Status: {completed_work['status']} Key Outputs: {completed_work['outputs']} Open Questions: {completed_work.get('open_questions', 'None')} Next Agent Should: {completed_work['next_steps']} """
Memory Systems
import json from pathlib import Path from datetime import datetime # Append-only JSONL memory — agent-friendly, auditable class AgentMemory: def __init__(self, path: str = "agent_memory.jsonl"): self.path = Path(path) # Schema declaration as first line if not self.path.exists(): self.path.write_text( json.dumps({"_schema": "v1", "fields": ["ts", "type", "key", "value"]}) + "\n" ) def remember(self, memory_type: str, key: str, value: str) -> None: entry = { "ts": datetime.utcnow().isoformat(), "type": memory_type, # "fact" | "decision" | "entity" | "error" "key": key, "value": value, } with self.path.open("a") as f: f.write(json.dumps(entry) + "\n") def recall(self, memory_type: str | None = None, limit: int = 50) -> list[dict]: entries = [] with self.path.open() as f: for line in f: entry = json.loads(line) if "_schema" in entry: continue if memory_type is None or entry["type"] == memory_type: entries.append(entry) return entries[-limit:] # most recent N def recall_as_context(self, memory_type: str | None = None) -> str: entries = self.recall(memory_type) if not entries: return "No relevant memories." lines = [f"[{e['ts']}] {e['type']}/{e['key']}: {e['value']}" for e in entries] return "\n".join(lines) # Usage memory = AgentMemory() memory.remember("decision", "database_choice", "PostgreSQL — chosen for JSONB support") memory.remember("entity", "user_id_format", "UUID v4, stored as TEXT") # Inject into agent context context = f"""AGENT MEMORY {memory.recall_as_context()} --- """
Tool Design Principles
# Good tool: single responsibility, structured output, error info included def search_codebase( query: str, file_pattern: str = "**/*.py", max_results: int = 10, ) -> dict: """ Search codebase for relevant code. Returns structured results an agent can parse without hallucination. Always include metadata — agents need to know WHERE results came from. """ import glob, re results = [] for filepath in glob.glob(file_pattern, recursive=True): try: content = Path(filepath).read_text() if query.lower() in content.lower(): # Find line numbers for precise context lines = content.splitlines() matches = [ {"line": i + 1, "text": line} for i, line in enumerate(lines) if query.lower() in line.lower() ] results.append({ "file": filepath, "match_count": len(matches), "matches": matches[:3], # Top 3 per file }) except (UnicodeDecodeError, PermissionError): pass return { "query": query, "total_files_matched": len(results), "results": results[:max_results], "truncated": len(results) > max_results, } # Tool output offloading — don't bloat context with large outputs def run_with_file_output(tool_fn, args: dict, output_path: str) -> str: """ Run a tool and write output to file instead of returning to context. Returns a file reference the agent can selectively read. """ result = tool_fn(**args) Path(output_path).write_text(json.dumps(result, indent=2)) return f"[OUTPUT SAVED: {output_path}] — {len(str(result))} chars. Read with read_file('{output_path}')."
LLM-as-Judge Evaluation
import anthropic from enum import Enum client = anthropic.Anthropic() # uses ANTHROPIC_API_KEY class JudgeVerdict(Enum): A_BETTER = "A" B_BETTER = "B" TIE = "TIE" def pairwise_judge( prompt: str, response_a: str, response_b: str, criteria: list[str], model: str = "claude-opus-4-5", ) -> dict: """ Compare two responses with position bias mitigation. Runs A/B and B/A then averages to cancel order effects. """ def single_comparison(first: str, second: str) -> str: criteria_text = "\n".join(f"- {c}" for c in criteria) judge_prompt = f"""Compare these two responses to the prompt below. Prompt: {prompt} Response 1: {first} Response 2: {second} Criteria: {criteria_text} Which response better satisfies the criteria? Reply with exactly one of: RESPONSE_1, RESPONSE_2, TIE Then on a new line explain in 1-2 sentences.""" resp = client.messages.create( model=model, max_tokens=256, system="You are an impartial evaluator. Be concise and consistent.", messages=[{"role": "user", "content": judge_prompt}], ) return resp.content[0].text.strip() # Run both orderings to mitigate position bias ab_result = single_comparison(response_a, response_b) ba_result = single_comparison(response_b, response_a) # Normalize: in ba_result, "RESPONSE_1" means B won def normalize(result: str, flipped: bool) -> JudgeVerdict: first_line = result.splitlines()[0] if "TIE" in first_line: return JudgeVerdict.TIE if "RESPONSE_1" in first_line: return JudgeVerdict.B_BETTER if flipped else JudgeVerdict.A_BETTER return JudgeVerdict.A_BETTER if flipped else JudgeVerdict.B_BETTER ab_verdict = normalize(ab_result, flipped=False) ba_verdict = normalize(ba_result, flipped=True) if ab_verdict == ba_verdict: final = ab_verdict confidence = "high" else: final = JudgeVerdict.TIE # Disagreement → tie confidence = "low" return { "verdict": final.value, "confidence": confidence, "ab_result": ab_result, "ba_result": ba_result, }
BDI Mental States
from dataclasses import dataclass, field from typing import Any @dataclass class Belief: subject: str predicate: str object_: Any confidence: float = 1.0 source: str = "observation" @dataclass class Desire: goal: str priority: float # 0.0 - 1.0 conditions: list[str] = field(default_factory=list) @dataclass class Intention: action_plan: list[str] committed_to: str # which desire this serves status: str = "pending" # pending | active | complete | abandoned class BDIAgent: def __init__(self): self.beliefs: list[Belief] = [] self.desires: list[Desire] = [] self.intentions: list[Intention] = [] def perceive(self, rdf_triples: list[tuple]) -> None: """Convert RDF context into beliefs.""" for subject, predicate, obj in rdf_triples: self.beliefs.append(Belief( subject=subject, predicate=predicate, object_=obj, )) def deliberate(self) -> Desire | None: """Select highest-priority achievable desire.""" achievable = [ d for d in self.desires if self._conditions_met(d.conditions) ] if not achievable: return None return max(achievable, key=lambda d: d.priority) def plan(self, desire: Desire) -> Intention: """Generate action plan for a desire.""" # In production: call LLM to generate plan steps = [f"Execute step for: {desire.goal}"] intention = Intention( action_plan=steps, committed_to=desire.goal, ) self.intentions.append(intention) return intention def _conditions_met(self, conditions: list[str]) -> bool: belief_strings = { f"{b.subject}:{b.predicate}:{b.object_}" for b in self.beliefs } return all(c in belief_strings for c in conditions) def as_context_block(self) -> str: """Serialize mental state for injection into LLM context.""" beliefs_text = "\n".join( f" - {b.subject} {b.predicate} {b.object_} (conf={b.confidence})" for b in self.beliefs[-10:] ) desires_text = "\n".join( f" - [{b.priority:.1f}] {b.goal}" for b in self.desires ) intentions_text = "\n".join( f" - {i.committed_to}: {i.status}" for i in self.intentions ) return f"""BDI MENTAL STATE Beliefs (recent): {beliefs_text} Desires: {desires_text} Intentions: {intentions_text} """
Filesystem Context Pattern
from pathlib import Path import json # Use filesystem as infinite context extension class FilesystemContext: def __init__(self, workspace: str = ".agent_workspace"): self.workspace = Path(workspace) self.workspace.mkdir(exist_ok=True) def offload(self, key: str, data: Any) -> str: """Write large data to file, return reference string for context.""" path = self.workspace / f"{key}.json" path.write_text(json.dumps(data, indent=2)) size = len(json.dumps(data)) return f"[FILE_REF:{key}] ({size} bytes) → {path}" def load(self, key: str) -> Any: """Load previously offloaded data.""" path = self.workspace / f"{key}.json" return json.loads(path.read_text()) def list_available(self) -> str: """Let agent discover what context is available.""" files = list(self.workspace.glob("*.json")) if not files: return "No context files available." lines = [] for f in files: size = f.stat().st_size lines.append(f" - {f.stem}: {size} bytes") return "AVAILABLE CONTEXT FILES:\n" + "\n".join(lines) def write_plan(self, plan: list[str]) -> str: """Persist agent plan so it survives context resets.""" return self.offload("current_plan", {"steps": plan, "current": 0}) def tick_plan(self) -> str | None: """Advance to next step, return current step or None if done.""" data = self.load("current_plan") idx = data["current"] if idx >= len(data["steps"]): return None data["current"] += 1 self.offload("current_plan", data) return data["steps"][idx]
Skill Trigger Reference
| Skill | Activate When User Says |
|---|---|
| "explain context windows", "design agent architecture" |
| "diagnose context problems", "fix lost-in-middle", "debug agent failures" |
| "compress context", "summarize conversation", "reduce token usage" |
| "optimize context", "reduce token costs", "implement KV-cache" |
| "design multi-agent system", "implement supervisor pattern" |
| "implement agent memory", "build knowledge graph", "track entities" |
| "design agent tools", "reduce tool complexity", "implement MCP tools" |
| "offload context to files", "agent scratch pad", "file-based context" |
| "build background agent", "sandboxed execution", "multiplayer agent" |
| "evaluate agent performance", "build test framework", "measure quality" |
| "implement LLM-as-judge", "compare model outputs", "mitigate bias" |
| "start LLM project", "design batch pipeline", "evaluate task-model fit" |
| "model agent mental states", "implement BDI architecture", "transform RDF to beliefs" |
Common Patterns
Progressive Context Loading
# Only load full skill content when triggered — saves tokens on every request class SkillLoader: def __init__(self, skills_dir: str = "skills"): self.skills_dir = Path(skills_dir) self._index = None def get_index(self) -> str: """Load lightweight index (names + one-line descriptions only).""" if self._index: return self._index skills = [] for skill_dir in self.skills_dir.iterdir(): readme = skill_dir / "README.md" if readme.exists(): first_line = readme.read_text().splitlines()[0] skills.append(f"- {skill_dir.name}: {first_line}") self._index = "\n".join(skills) return self._index def load_skill(self, skill_name: str) -> str: """Load full skill content only when needed.""" skill_file = self.skills_dir / skill_name / "SKILL.md" if not skill_file.exists(): raise FileNotFoundError(f"Skill not found: {skill_name}") return skill_file.read_text()
Token Budget Enforcement
import tiktoken def enforce_budget( content: str, max_tokens: int, model: str = "gpt-4o", strategy: str = "truncate_middle", ) -> str: """ Ensure content fits within token budget. Strategies: truncate_end | truncate_middle | summarize """ enc = tiktoken.encoding_for_model(model) tokens = enc.encode(content) if len(tokens) <= max_tokens: return content if strategy == "truncate_end": return enc.decode(tokens[:max_tokens]) if strategy == "truncate_middle": keep = max_tokens // 2 start = enc.decode(tokens[:keep]) end = enc.decode(tokens[-keep:]) return f"{start}\n\n[... {len(tokens) - max_tokens} tokens truncated ...]\n\n{end}" raise ValueError(f"Unknown strategy: {strategy}")
Troubleshooting
Agent loses track of earlier decisions
Cause: Message history too long, decisions buried in the middle. Fix: Use
AgentMemory to extract decisions into persistent JSONL; inject only the decision log at context start.
Tool calls return too much data
Cause: Tool output floods the context window. Fix: Use
FilesystemContext.offload() and return file references; agent reads only what it needs.
Multi-agent handoffs lose context
Cause: Raw message history passed between agents. Fix: Use
create_handoff_context() — structured summaries only, never raw history.
LLM-as-Judge gives inconsistent verdicts
Cause: Position bias (model prefers whichever response appears first). Fix: Use
pairwise_judge() which runs A/B and B/A, resolves disagreements as ties.
Agent ignores early instructions
Cause: Instructions in the middle of a long system prompt. Fix: Place critical constraints at the top and bottom of the system prompt; use U-shaped placement.
Context grows unbounded in long sessions
Cause: No compression strategy; messages accumulate. Fix: Run
compress_conversation() every N turns; keep the last 10 messages verbatim, summarize the rest.
Environment Variables
ANTHROPIC_API_KEY= # Required for Claude API calls OPENAI_API_KEY= # Optional, for OpenAI-based evaluation AGENT_WORKSPACE_DIR= # Optional, filesystem context directory (default: .agent_workspace) AGENT_MEMORY_PATH= # Optional, JSONL memory file path (default: agent_memory.jsonl)
References
- Repository
- Cursor Plugin Directory
- Cited in: Meta Context Engineering via Agentic Skill Evolution — Peking University (2026)
- Open Plugins Standard