Skilllibrary agent-memory

Name: agent-memory
Author: merceralex397-collab

install

source · Clone the upstream repo

git clone https://github.com/merceralex397-collab/skilllibrary

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/merceralex397-collab/skilllibrary "$T" && mkdir -p ~/.claude/skills && cp -r "$T/11-ai-llm-runtime-and-integration/agent-memory" ~/.claude/skills/merceralex397-collab-skilllibrary-agent-memory && rm -rf "$T"

manifest: 11-ai-llm-runtime-and-integration/agent-memory/SKILL.md

source content

Purpose

Design and implement memory systems for AI agents: working memory buffers, episodic recall, and long-term knowledge stores.

When to use this skill

adding conversation memory to a chat agent or assistant
implementing RAG-based recall for long-running agent sessions
designing memory compaction, summarization, or eviction policies
choosing between in-context memory, vector store, or structured memory

Do not use this skill when

setting up vector DB infrastructure — prefer
```
embeddings-indexing
```
doing prompt engineering without memory needs — prefer prompt skills
building inference infrastructure — prefer
```
inference-serving
```

Procedure

Classify memory needs — working memory (current conversation), episodic (past interactions), semantic (facts/knowledge), procedural (learned skills).
Choose memory backend — in-context window for short conversations, vector store for episodic recall, structured DB for facts.
Implement working memory — maintain a sliding window of recent messages. Truncate or summarize when approaching context limit.
Add episodic recall — embed and store conversation turns in a vector DB. Retrieve top-k relevant memories on each turn.
Implement summarization — when conversation exceeds threshold, compress older turns into a summary. Store summary as a memory entry.
Set eviction policy — time-based decay (older = lower score), relevance-based (low retrieval count = evict), or fixed capacity with LRU.
Add memory metadata — timestamp, source, confidence, access count. Use metadata filters during retrieval.
Test retrieval quality — verify the agent recalls relevant past context; measure recall@k on known memory queries.

Memory architecture

Working Memory (in-context)
  system prompt + recent N messages + retrieved memories
  |
  v -- overflow -->
Episodic Store (vector DB)
  embedded conversation turns, searchable by similarity
  |
  v -- compaction -->
Summary Store (structured)
  compressed summaries of past sessions, key facts extracted

Key patterns

class AgentMemory:
    def __init__(self, vector_store, max_working=20):
        self.working = []          # recent messages
        self.max_working = max_working
        self.vector_store = vector_store

    def add(self, role, content):
        self.working.append({"role": role, "content": content})
        self.vector_store.upsert(content, metadata={"role": role, "ts": time.time()})
        if len(self.working) > self.max_working:
            self._compact()

    def recall(self, query, k=5):
        return self.vector_store.query(query, top_k=k)

    def build_context(self, query):
        recalled = self.recall(query)
        return recalled + self.working[-self.max_working:]

    def _compact(self):
        old = self.working[:len(self.working)//2]
        summary = llm_summarize(old)
        self.working = [{"role": "system", "content": summary}] + self.working[len(old):]

Decision rules

Keep working memory under 50% of context window — leave room for generation and tool results.
Embed conversation turns at the turn level, not token level — retrieval is more coherent.
Always include timestamps in memory metadata — enables recency-weighted retrieval.
Summarize, do not delete — compressed memories are better than lost memories.
Test with adversarial queries — "what did we discuss about X last week?" should retrieve correctly.

References

Related skills

```
context-management-memory
```
— managing context window budgets
```
embeddings-indexing
```
— vector store setup and indexing
```
model-routing
```
— choosing models based on memory needs