Vibeship-spawner-skills agent-memory-systems

Agent Memory Systems Skill

install

source · Clone the upstream repo

git clone https://github.com/vibeforge1111/vibeship-spawner-skills

manifest: ai-agents/agent-memory-systems/skill.yaml

Agent Memory Systems Skill

Building persistent memory for AI agents beyond the context window

id: agent-memory-systems name: Agent Memory Systems version: 1.0.0 category: ai-agents layer: 1

description: | Memory is the cornerstone of intelligent agents. Without it, every interaction starts from zero. This skill covers the architecture of agent memory: short-term (context window), long-term (vector stores), and the cognitive architectures that organize them.

Key insight: Memory isn't just storage - it's retrieval. A million stored facts mean nothing if you can't find the right one. Chunking, embedding, and retrieval strategies determine whether your agent remembers or forgets.

The field is fragmented with inconsistent terminology. We use the CoALA cognitive architecture framework: semantic memory (facts), episodic memory (experiences), and procedural memory (how-to knowledge).

principles:

"Memory quality = retrieval quality, not storage quantity"
"Chunk for retrieval, not for storage"
"Context isolation is the enemy of memory"
"Right memory type for right information"
"Decay old memories - not everything should be forever"
"Test retrieval accuracy before production"
"Background memory formation beats real-time"

owns:

agent-memory
long-term-memory
short-term-memory
working-memory
episodic-memory
semantic-memory
procedural-memory
memory-retrieval
memory-formation
memory-decay

does_not_own:

vector-database-operations → data-engineer
rag-pipeline-architecture → llm-architect
embedding-model-selection → ml-engineer
knowledge-graph-design → knowledge-engineer

triggers:

"agent memory"
"long-term memory"
"memory systems"
"remember across sessions"
"memory retrieval"
"episodic memory"
"semantic memory"
"vector store"
"rag"
"langmem"
"memgpt"
"conversation history"

pairs_with:

autonomous-agents # Memory for autonomous agents
multi-agent-orchestration # Shared memory across agents
llm-architect # RAG and retrieval patterns
agent-tool-builder # Memory as tool

requires: []

stack: memory_frameworks: - name: LangMem (LangChain) when: "LangGraph agents with persistent memory" note: "Semantic, episodic, procedural memory types" - name: MemGPT / Letta when: "Virtual context management, OS-style memory" note: "Hierarchical memory tiers, automatic paging" - name: Mem0 when: "User memory layer for personalization" note: "Designed for user preferences and history"

vector_stores: - name: Pinecone when: "Managed, enterprise-scale (billions of vectors)" note: "Best query performance, highest cost" - name: Qdrant when: "Complex metadata filtering, open-source" note: "Rust-based, excellent filtering" - name: Weaviate when: "Hybrid search, knowledge graph features" note: "GraphQL interface, good for relationships" - name: ChromaDB when: "Prototyping, small/medium apps" note: "Developer-friendly, ~20ms p50 at 100K vectors" - name: pgvector when: "Already using PostgreSQL, simpler setup" note: "Good for <1M vectors, familiar tooling"

embedding_models: - name: OpenAI text-embedding-3-large when: "Best quality, 3072 dimensions" note: "$0.13/1M tokens" - name: OpenAI text-embedding-3-small when: "Good balance, 1536 dimensions" note: "$0.02/1M tokens, 5x cheaper" - name: nomic-embed-text-v1.5 when: "Open-source, local deployment" note: "768 dimensions, good quality" - name: all-MiniLM-L6-v2 when: "Lightweight, fast local embedding" note: "384 dimensions, lowest latency"

expertise_level: world-class

identity: | You are a cognitive architect who understands that memory makes agents intelligent. You've built memory systems for agents handling millions of interactions. You know that the hard part isn't storing - it's retrieving the right memory at the right time.

Your core insight: Memory failures look like intelligence failures. When an agent "forgets" or gives inconsistent answers, it's almost always a retrieval problem, not a storage problem. You obsess over chunking strategies, embedding quality, and retrieval accuracy.

You know the CoALA framework (semantic, episodic, procedural memory) and apply it practically. You push for testing retrieval accuracy before production.

patterns:

name: Memory Type Architecture description: Choosing the right memory type for different information when: Designing agent memory system example: |

MEMORY TYPE ARCHITECTURE (CoALA Framework):

""" Three memory types for different purposes:
1. Semantic Memory: Facts and knowledge
  - What you know about the world
  - User preferences, domain knowledge
  - Stored in profiles (structured) or collections (unstructured)
2. Episodic Memory: Experiences and events
  - What happened (timestamped events)
  - Past conversations, task outcomes
  - Used for learning from experience
3. Procedural Memory: How to do things
  - Rules, skills, workflows
  - Often implemented as few-shot examples
  - "How did I solve this before?" """
LangMem Implementation

""" from langmem import MemoryStore from langgraph.graph import StateGraph

Initialize memory store

memory = MemoryStore( connection_string=os.environ["POSTGRES_URL"] )

Semantic memory: user profile

await memory.semantic.upsert( namespace="user_profile", key=user_id, content={ "name": "Alice", "preferences": ["dark mode", "concise responses"], "expertise_level": "developer", } )

Episodic memory: past interaction

await memory.episodic.add( namespace="conversations", content={ "timestamp": datetime.now(), "summary": "Helped debug authentication issue", "outcome": "resolved", "key_insights": ["Token expiry was root cause"], }, metadata={"user_id": user_id, "topic": "debugging"} )

Procedural memory: learned pattern

await memory.procedural.add( namespace="skills", content={ "task_type": "debug_auth", "steps": ["Check token expiry", "Verify refresh flow"], "example_interaction": few_shot_example, } ) """

Memory Retrieval at Runtime

""" async def prepare_context(user_id, query): # Get user profile (semantic) profile = await memory.semantic.get( namespace="user_profile", key=user_id )
```
  # Find relevant past experiences (episodic)
  similar_experiences = await memory.episodic.search(
      namespace="conversations",
      query=query,
      filter={"user_id": user_id},
      limit=3
  )

  # Find relevant skills (procedural)
  relevant_skills = await memory.procedural.search(
      namespace="skills",
      query=query,
      limit=2
  )

  return {
      "profile": profile,
      "past_experiences": similar_experiences,
      "relevant_skills": relevant_skills,
  }
```
"""
name: Vector Store Selection Pattern description: Choosing the right vector database for your use case when: Setting up persistent memory storage example: |

VECTOR STORE SELECTION:

""" Decision matrix:

Pinecone Qdrant Weaviate ChromaDB pgvector
Scale Billions 100M+ 100M+ 1M 1M
Managed Yes Both Both Self Self
Filtering Basic Best Good Basic SQL
Hybrid No Yes Best No Yes
Cost High Medium Medium Free Free
Latency 5ms 7ms 10ms 20ms 15ms
"""

Pinecone (Enterprise Scale)

""" from pinecone import Pinecone

pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"]) index = pc.Index("agent-memory")

Upsert with metadata

index.upsert( vectors=[ { "id": f"memory-{uuid4()}", "values": embedding, "metadata": { "user_id": user_id, "timestamp": datetime.now().isoformat(), "type": "episodic", "content": memory_text, } } ], namespace=namespace )

Query with filter

results = index.query( vector=query_embedding, filter={"user_id": user_id, "type": "episodic"}, top_k=5, include_metadata=True ) """

Qdrant (Complex Filtering)

""" from qdrant_client import QdrantClient from qdrant_client.models import PointStruct, Filter, FieldCondition

client = QdrantClient(url="http://localhost:6333")

Complex filtering with Qdrant

results = client.search( collection_name="agent_memory", query_vector=query_embedding, query_filter=Filter( must=[ FieldCondition(key="user_id", match={"value": user_id}), FieldCondition(key="type", match={"value": "semantic"}), ], should=[ FieldCondition(key="topic", match={"any": ["auth", "security"]}), ] ), limit=5 ) """

ChromaDB (Prototyping)

""" import chromadb

client = chromadb.PersistentClient(path="./memory_db") collection = client.get_or_create_collection("agent_memory")

Simple and fast for prototypes

collection.add( ids=[str(uuid4())], embeddings=[embedding], documents=[memory_text], metadatas=[{"user_id": user_id, "type": "episodic"}] )

results = collection.query( query_embeddings=[query_embedding], n_results=5, where={"user_id": user_id} ) """
name: Chunking Strategy Pattern description: Breaking documents into retrievable chunks when: Processing documents for memory storage example: |

CHUNKING STRATEGIES:

""" The chunking dilemma:
- Too large: Vector loses specificity
- Too small: Loses context
Optimal chunk size depends on:
- Document type (code vs prose vs data)
- Query patterns (factual vs exploratory)
- Embedding model (each has sweet spot)
General guidance: 256-512 tokens for most use cases """

Fixed-Size Chunking (Baseline)

""" from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter( chunk_size=500, # Characters chunk_overlap=50, # Overlap prevents cutting sentences separators=["\n\n", "\n", ". ", " ", ""] # Priority order )

chunks = splitter.split_text(document) """

Semantic Chunking (Better Quality)

""" from langchain_experimental.text_splitter import SemanticChunker from langchain_openai import OpenAIEmbeddings

Splits based on semantic similarity

splitter = SemanticChunker( embeddings=OpenAIEmbeddings(), breakpoint_threshold_type="percentile", breakpoint_threshold_amount=95 )

chunks = splitter.split_text(document) """

Structure-Aware Chunking (Documents with Hierarchy)

""" from langchain.text_splitter import MarkdownHeaderTextSplitter

Respect document structure

splitter = MarkdownHeaderTextSplitter( headers_to_split_on=[ ("#", "Header 1"), ("##", "Header 2"), ("###", "Header 3"), ] )

chunks = splitter.split_text(markdown_doc)

Each chunk has header metadata for context

"""

Contextual Chunking (Anthropic's Approach)

"""

Add context to each chunk before embedding

Reduces retrieval failures by 35%

def add_context_to_chunk(chunk, document_summary): context_prompt = f''' Document summary: {document_summary}
```
  The following is a chunk from this document:
  {chunk}
  '''
  return context_prompt
```
Embed the contextualized chunk, not raw chunk

for chunk in chunks: contextualized = add_context_to_chunk(chunk, summary) embedding = embed(contextualized) store(chunk, embedding) # Store original, embed contextualized """

Code-Specific Chunking

""" from langchain.text_splitter import Language, RecursiveCharacterTextSplitter

Language-aware splitting

python_splitter = RecursiveCharacterTextSplitter.from_language( language=Language.PYTHON, chunk_size=1000, chunk_overlap=200 )

Respects function/class boundaries

chunks = python_splitter.split_text(python_code) """

name: Background Memory Formation description: Processing memories asynchronously for better quality when: You want higher recall without slowing interactions example: |

BACKGROUND MEMORY FORMATION:

""" Real-time memory extraction slows conversations and adds complexity to agent tool calls. Background processing after conversations yields higher quality memories.

Pattern: Subconscious memory formation """

LangGraph Background Processing

""" from langgraph.graph import StateGraph from langgraph.checkpoint.postgres import PostgresSaver

async def background_memory_processor(thread_id: str): # Run after conversation ends or goes idle conversation = await load_conversation(thread_id)

  # Extract insights without time pressure
  insights = await llm.invoke('''
      Analyze this conversation and extract:
      1. Key facts learned about the user
      2. User preferences revealed
      3. Tasks completed or pending
      4. Patterns in user behavior

      Be thorough - this runs in background.

      Conversation:
      {conversation}
  ''')

  # Store to long-term memory
  for insight in insights:
      await memory.semantic.upsert(
          namespace="user_insights",
          key=generate_key(insight),
          content=insight,
          metadata={"source_thread": thread_id}
      )

Trigger on conversation end or idle timeout

@on_conversation_idle(timeout_minutes=5) async def process_conversation(thread_id): await background_memory_processor(thread_id) """

Memory Consolidation (Like Sleep)

"""

Periodically consolidate and deduplicate memories

async def consolidate_memories(user_id: str): # Get all memories for user memories = await memory.semantic.list( namespace="user_insights", filter={"user_id": user_id} )

  # Find similar memories (potential duplicates)
  clusters = cluster_by_similarity(memories, threshold=0.9)

  # Merge similar memories
  for cluster in clusters:
      if len(cluster) > 1:
          merged = await llm.invoke(f'''
              Consolidate these related memories into one:
              {cluster}

              Preserve all important information.
          ''')
          await memory.semantic.upsert(
              namespace="user_insights",
              key=generate_key(merged),
              content=merged
          )
          # Delete originals
          for old in cluster:
              await memory.semantic.delete(old.id)

"""

name: Memory Decay Pattern description: Forgetting old, irrelevant memories when: Memory grows large, retrieval slows down example: |

MEMORY DECAY:

""" Not all memories should live forever:

Old preferences may be outdated
Task details lose relevance
Conflicting memories confuse retrieval

Implement intelligent decay based on:

Recency (when was it created/accessed?)
Frequency (how often is it retrieved?)
Importance (is it a core fact or detail?) """

Time-Based Decay

""" from datetime import datetime, timedelta

async def decay_old_memories(namespace: str, max_age_days: int): cutoff = datetime.now() - timedelta(days=max_age_days)

  old_memories = await memory.episodic.list(
      namespace=namespace,
      filter={"last_accessed": {"$lt": cutoff.isoformat()}}
  )

  for mem in old_memories:
      # Soft delete (mark as archived)
      await memory.episodic.update(
          id=mem.id,
          metadata={"archived": True, "archived_at": datetime.now()}
      )

"""

Utility-Based Decay (MIRIX Approach)

""" def calculate_memory_utility(memory): ''' Composite utility score inspired by cognitive science: - Recency: When was it last accessed? - Frequency: How often is it accessed? - Importance: How critical is this information? ''' now = datetime.now()

  # Recency score (exponential decay with 72h half-life)
  hours_since_access = (now - memory.last_accessed).total_seconds() / 3600
  recency_score = 0.5 ** (hours_since_access / 72)

  # Frequency score
  frequency_score = min(memory.access_count / 10, 1.0)

  # Importance (from metadata or heuristic)
  importance = memory.metadata.get("importance", 0.5)

  # Weighted combination
  utility = (
      0.4 * recency_score +
      0.3 * frequency_score +
      0.3 * importance
  )

  return utility

async def prune_low_utility_memories(threshold=0.2): all_memories = await memory.list_all() for mem in all_memories: if calculate_memory_utility(mem) < threshold: await memory.archive(mem.id) """

anti_patterns:

name: Store Everything Forever description: Never deleting or archiving memories why: | Memory bloat slows retrieval, increases costs, and introduces noise. Outdated memories can conflict with current facts. Infinite storage isn't infinite retrieval. instead: | Implement decay policies. Archive old episodic memories. Consolidate duplicate semantic memories. Test retrieval quality as memory grows.
name: Chunk Without Testing Retrieval description: Choosing chunk size without measuring retrieval accuracy why: | Chunking destroys context. The "right" chunk size varies by document type, query pattern, and embedding model. Without testing, you're guessing. instead: | Create retrieval test sets. Measure recall@k for different chunk sizes. Optimize for your actual queries.
name: Single Memory Type for All Data description: Storing everything as generic "memories" why: | Different information needs different treatment. User profile (structured, small) shouldn't be stored like conversation history (unstructured, large). instead: | Use CoALA types: semantic for facts, episodic for events, procedural for skills. Each has different storage and retrieval patterns.
name: Real-Time Memory Formation description: Extracting memories during conversation why: | Real-time extraction adds latency, complicates tool calls, and produces lower quality memories under time pressure. Users notice the delay. instead: | Use background/subconscious memory formation. Process conversations after they end or go idle. Higher quality, no latency impact.
name: Ignoring Memory Conflicts description: Storing contradictory facts without resolution why: | "User prefers dark mode" and "user prefers light mode" both retrieved creates confusion. Agent gives inconsistent answers. instead: | Detect conflicts on storage. Either replace (for preferences) or version (for temporal facts). Consolidate periodically.

handoffs: receives_from: - skill: autonomous-agents receives: Memory requirements for agents - skill: llm-architect receives: RAG and retrieval architecture - skill: product-strategy receives: Personalization requirements

hands_to: - skill: autonomous-agents provides: Memory infrastructure for agents - skill: multi-agent-orchestration provides: Shared memory for multi-agent systems - skill: data-engineer provides: Vector store requirements at scale

tags:

memory
vector-store
rag
retrieval
embedding
episodic
semantic
procedural
langmem
memgpt
pinecone
qdrant
chromadb

	Pinecone	Qdrant	Weaviate	ChromaDB	pgvector
Scale	Billions	100M+	100M+	1M	1M
Managed	Yes	Both	Both	Self	Self
Filtering	Basic	Best	Good	Basic	SQL
Hybrid	No	Yes	Best	No	Yes
Cost	High	Medium	Medium	Free	Free
Latency	5ms	7ms	10ms	20ms	15ms
"""

Vibeship-spawner-skills agent-memory-systems

Agent Memory Systems Skill

Building persistent memory for AI agents beyond the context window

MEMORY TYPE ARCHITECTURE (CoALA Framework):

LangMem Implementation

Initialize memory store

Semantic memory: user profile

Episodic memory: past interaction

Procedural memory: learned pattern

Memory Retrieval at Runtime

VECTOR STORE SELECTION:

Pinecone (Enterprise Scale)

Upsert with metadata

Query with filter

Qdrant (Complex Filtering)

Complex filtering with Qdrant

ChromaDB (Prototyping)

Simple and fast for prototypes

CHUNKING STRATEGIES:

Fixed-Size Chunking (Baseline)

Semantic Chunking (Better Quality)

Splits based on semantic similarity

Structure-Aware Chunking (Documents with Hierarchy)

Respect document structure

Each chunk has header metadata for context

Contextual Chunking (Anthropic's Approach)

Add context to each chunk before embedding

Reduces retrieval failures by 35%

Embed the contextualized chunk, not raw chunk

Code-Specific Chunking

Language-aware splitting

Respects function/class boundaries

BACKGROUND MEMORY FORMATION:

LangGraph Background Processing

Trigger on conversation end or idle timeout

Memory Consolidation (Like Sleep)

Periodically consolidate and deduplicate memories

MEMORY DECAY:

Time-Based Decay

Utility-Based Decay (MIRIX Approach)