Claude-skill-registry building-with-agent-memory
Build persistent memory systems for AI agents using Mem0, claude-mem, or custom implementations. Use when adding conversation memory, user preferences, or contextual recall to agents. Covers memory architecture patterns, retrieval strategies, and privacy controls. NOT for RAG systems (use building-rag-systems).
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/building-with-agent-memory" ~/.claude/skills/majiayu000-claude-skill-registry-building-with-agent-memory && rm -rf "$T"
manifest:
skills/data/building-with-agent-memory/SKILL.mdsource content
Building Agent Memory Systems
Production-grade memory layers for AI agents that persist context across sessions.
Quick Decision
| Need | Tool | Why |
|---|---|---|
| Simple memory for any agent | Mem0 | Open-source, Python SDK, minimal setup |
| Claude Code agent memory | claude-mem | Automatic hooks, 3-layer retrieval |
| Enterprise/self-editing memory | Letta (MemGPT) | Agent-driven memory management |
| Custom memory | RAG + state management | Full control |
Mem0: Primary Implementation
Installation
pip install mem0ai export OPENAI_API_KEY="your-key"
Basic Usage
from mem0 import Memory m = Memory() # Add memory from conversation messages = [ {"role": "user", "content": "Hi, I'm Alex. I love basketball and gaming."}, {"role": "assistant", "content": "Hey Alex! I'll remember your interests."} ] m.add(messages, user_id="alex") # Search memories results = m.search("What do you know about me?", filters={"user_id": "alex"}) # Returns: {"results": [{"memory": "Name is Alex. Enjoys basketball and gaming.", "score": 0.89}]}
Default Configuration (OSS)
- LLM: OpenAI gpt-4.1-nano-2025-04-14 (fact extraction)
- Embeddings: text-embedding-3-small (1536 dims)
- Vector Store: Qdrant (on-disk at /tmp/qdrant)
- History: SQLite at ~/.mem0/history.db
Memory Categories
| Category | Scope | Use Case |
|---|---|---|
| Individual users | Preferences, identity, history |
| Specific agents | Agent-specific context |
| Conversation | Temporary, conversation-scoped |
| Single execution | Within-session partitioning |
Core Operations
# Add with metadata m.add( messages, user_id="alex", metadata={"category": "preferences", "project": "task-api"} ) # Search with filters results = m.search( "task preferences", filters={ "user_id": "alex", "category": "preferences" } ) # Search with date filter results = m.search( query="recent interactions", filters={ "AND": [ {"user_id": "alex"}, {"created_at": {"gte": "2024-07-01"}} ] } ) # Update memory m.update(memory_id="mem_123abc", new_content="Updated preference") # Delete memory m.delete(memory_id="mem_123abc")
Custom Configuration
from mem0 import Memory config = { "llm": { "provider": "anthropic", "config": { "model": "claude-3-5-sonnet-20241022", "api_key": "your-anthropic-key" } }, "embedder": { "provider": "openai", "config": { "model": "text-embedding-3-small" } }, "vector_store": { "provider": "qdrant", "config": { "path": "./my_memories" # Custom path } } } m = Memory.from_config(config)
Memory Architecture Patterns
Five Memory Types
┌────────────────────────────────────────────────────┐ │ AGENT MEMORY │ ├─────────────┬──────────────┬───────────────────────┤ │ Conversation│ Working │ Long-term │ │ Memory │ Memory │ Memory │ │ │ │ │ │ Last N │ Current task │ ┌─────────────────┐ │ │ messages │ context │ │ Episodic Memory │ │ │ │ │ │ (events) │ │ │ Volatile │ Task-scoped │ ├─────────────────┤ │ │ │ │ │ Semantic Memory │ │ │ │ │ │ (facts) │ │ │ │ │ └─────────────────┘ │ └─────────────┴──────────────┴───────────────────────┘
| Type | Persistence | Scope | Example |
|---|---|---|---|
| Conversation | Session | Current chat | Last 10 messages |
| Working | Task | Current task | Active goal, intermediate results |
| Episodic | Long-term | Time-stamped events | "Last Tuesday you asked about X" |
| Semantic | Long-term | Facts/entities | "Alex works at Acme" |
| Procedural | Long-term | Learned patterns | How user structures tasks |
Letta/MemGPT Two-Tier Architecture
┌─────────────────────────────────────────────────────┐ │ IN-CONTEXT (Core Memory) │ │ ┌──────────────┐ ┌──────────────────────────┐ │ │ │ Persona Block│ │ Human Block │ │ │ │ (agent) │ │ (user info) │ │ │ └──────────────┘ └──────────────────────────┘ │ │ Always visible to LLM │ └─────────────────────────────────────────────────────┘ │ │ (tool calls) ▼ ┌─────────────────────────────────────────────────────┐ │ OUT-OF-CONTEXT (External Memory) │ │ ┌──────────────────┐ ┌────────────────────────┐ │ │ │ Archival Memory │ │ Conversation History │ │ │ │ (vector DB) │ │ (full message log) │ │ │ └──────────────────┘ └────────────────────────┘ │ │ Retrieved on-demand via tools │ └─────────────────────────────────────────────────────┘
Self-Editing Tools (Letta pattern):
- Targeted updatesmemory_replace(block, old_text, new_text)
- Add new informationmemory_insert(block, text)
- Complete rewritememory_rethink(block, new_content)
Retrieval Strategies
Recency-Based
def get_recent_memories(user_id: str, limit: int = 5): """Most recent memories first.""" return m.search( query="", # Empty query = recency sort filters={"user_id": user_id}, top_k=limit )
Relevance-Based (Semantic)
def get_relevant_memories(user_id: str, query: str, limit: int = 5): """Semantic search over memories.""" return m.search( query=query, filters={"user_id": user_id}, top_k=limit, threshold=0.7 # Minimum similarity )
Entity-Based
def get_entity_memories(user_id: str, entity: str): """Memories mentioning specific entity.""" return m.search( query=entity, filters={ "user_id": user_id, "categories": {"contains": "entities"} } )
Hybrid Retrieval
def hybrid_retrieve(user_id: str, query: str, limit: int = 10): """Combine recency + relevance + entity.""" # Get relevant relevant = m.search(query, filters={"user_id": user_id}, top_k=limit) # Extract entities from query entities = extract_entities(query) entity_memories = [] for entity in entities: entity_memories.extend( m.search(entity, filters={"user_id": user_id}, top_k=3) ) # Deduplicate and score all_memories = deduplicate(relevant + entity_memories) # Weight: 0.5*relevance + 0.3*recency + 0.2*entity_match return sorted(all_memories, key=weighted_score, reverse=True)[:limit]
Context Window Management
Memory Injection Pattern
def build_prompt_with_memory(user_query: str, user_id: str) -> str: """Inject relevant memories before user query.""" memories = m.search(user_query, filters={"user_id": user_id}, top_k=5) memory_context = "\n".join([ f"- {mem['memory']}" for mem in memories['results'] ]) return f"""You know the following about this user: {memory_context} User query: {user_query} """
Token Budget Management
def fit_memories_to_budget(memories: list, budget_tokens: int = 2000) -> list: """Select memories that fit within token budget.""" selected = [] current_tokens = 0 for mem in memories: mem_tokens = estimate_tokens(mem['memory']) if current_tokens + mem_tokens <= budget_tokens: selected.append(mem) current_tokens += mem_tokens else: break return selected
Hierarchical Summarization
def summarize_old_memories(user_id: str, older_than_days: int = 30): """Compress old memories into summaries.""" old_memories = m.search( query="", filters={ "user_id": user_id, "created_at": {"lte": f"{days_ago(older_than_days)}"} } ) # Group by topic/category grouped = group_by_category(old_memories) # Generate summary per group for category, memories in grouped.items(): summary = llm.summarize(memories) m.add( [{"role": "system", "content": f"Summary of {category}: {summary}"}], user_id=user_id, metadata={"type": "summary", "category": category} ) # Delete original detailed memories for mem in memories: m.delete(mem['id'])
FastAPI Integration Pattern
from fastapi import FastAPI, Depends from mem0 import Memory app = FastAPI() memory = Memory() @app.post("/tasks") async def create_task_with_memory(task: TaskCreate, user_id: str): # 1. Retrieve relevant memories preference_memories = memory.search( f"task preferences for {task.category}", filters={"user_id": user_id, "category": "preferences"} ) pattern_memories = memory.search( f"past tasks like {task.title}", filters={"user_id": user_id, "category": "patterns"} ) # 2. Apply memories to task creation task = enhance_task_with_preferences(task, preference_memories) task = estimate_time_from_patterns(task, pattern_memories) # 3. Create the task created_task = await create_task(task) # 4. Store this interaction as new memory memory.add([ {"role": "user", "content": f"Created task: {task.title} in {task.category}"}, {"role": "assistant", "content": f"Task created with estimated time: {task.estimated_hours}h"} ], user_id=user_id, metadata={"category": "interactions"}) return created_task
claude-mem: Claude Code Memory
Architecture
┌─────────────────────────────────────────────────────┐ │ claude-mem │ ├─────────────────────────────────────────────────────┤ │ 5 Lifecycle Hooks: │ │ • SessionStart • PostToolUse • SessionEnd │ │ • UserPromptSubmit • Stop │ ├─────────────────────────────────────────────────────┤ │ Worker Service: HTTP API on port 37777 │ │ Web Viewer: http://localhost:37777 │ ├─────────────────────────────────────────────────────┤ │ Storage: │ │ • SQLite: Sessions, observations, summaries │ │ • Chroma: Hybrid semantic + keyword search │ └─────────────────────────────────────────────────────┘
3-Layer Token-Efficient Retrieval
Layer 1: search(query, type, limit) → Compact index (~50-100 tokens/result) → Returns observation IDs for filtering Layer 2: timeline(observation_id, context_window) → Chronological context around observations → Understand sequence of events Layer 3: get_observations(ids) → Full details only for filtered IDs → ~500-1,000 tokens/result Result: 10x token savings by filtering before fetching
Privacy Controls
<private> API_KEY=secret123 Sensitive project data here </private> Everything inside <private> tags is excluded from memory.
Memory Prioritization
Relevance Scoring Formula
def calculate_relevance_score(memory, query, current_time): """ Score = 0.5 * semantic_similarity + 0.3 * recency_decay + 0.2 * access_frequency """ semantic = cosine_similarity(embed(memory.text), embed(query)) days_old = (current_time - memory.created_at).days recency = math.exp(-days_old / 30) # Decay over 30 days frequency = memory.access_count / max_access_count return 0.5 * semantic + 0.3 * recency + 0.2 * frequency
Contradiction Resolution
def handle_contradiction(old_memory, new_memory): """Newer information wins for preferences.""" if new_memory.created_at > old_memory.created_at: # Update old memory with new information m.update(old_memory.id, new_memory.content) return "updated" return "kept_old"
Safety & Privacy
NEVER Store
- Passwords, API keys, tokens
- SSN, credit card numbers
- Medical records without explicit consent
- Information user explicitly asks to forget
ALWAYS Implement
- User consent before memory creation
- Clear disclosure of what's stored
- Easy opt-out mechanisms (delete all)
- GDPR/CCPA compliance for deletion requests
- Encryption at rest for memory stores
- Access controls for multi-tenant systems
Right to Forget
async def forget_user(user_id: str): """GDPR-compliant deletion.""" # Get all user memories all_memories = m.search("", filters={"user_id": user_id}, top_k=10000) # Delete each one for mem in all_memories['results']: m.delete(mem['id']) # Audit log log.info(f"Deleted all memories for user {user_id}")
Anti-Patterns
| Don't | Do Instead |
|---|---|
| Store everything | Prioritize by relevance |
| Infinite memory growth | Regular consolidation cycles |
| No user consent | Explicit opt-in for memory |
| Store PII freely | Anonymize or exclude sensitive data |
| Ignore contradictions | Timestamp-based resolution |
| Skip privacy controls | Implement patterns |
| Full retrieval every time | Token-budget constrained retrieval |