Skills prompt-assemble

Token-safe prompt assembly with memory orchestration. Use for any agent that needs to construct LLM prompts with memory retrieval. Guarantees no API failure due to token overflow. Implements two-phase context construction, memory safety valve, and hard limits on memory injection.

install

source · Clone the upstream repo

git clone https://github.com/openclaw/skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/alexunitario-sketch/prompt-assemble" ~/.claude/skills/clawdbot-skills-prompt-assemble && rm -rf "$T"

manifest: skills/alexunitario-sketch/prompt-assemble/SKILL.md

source content

Prompt Assemble

Overview

A standardized, token-safe prompt assembly framework that guarantees API stability. Implements Two-Phase Context Construction and Memory Safety Valve to prevent token overflow while maximizing relevant context.

Design Goals:

✅ Never fail due to memory-related token overflow
✅ Memory is always discardable enhancement, never rigid dependency
✅ Token budget decisions centralized at prompt assemble layer

When to Use

Use this skill when:

Building or modifying any agent that constructs prompts
Implementing memory retrieval systems
Adding new prompt-related logic to existing agents
Any scenario where token budget safety is required

Core Workflow

User Input
    ↓
Need-Memory Decision
    ↓
Minimal Context Build
    ↓
Memory Retrieval (Optional)
    ↓
Memory Summarization
    ↓
Token Estimation
    ↓
Safety Valve Decision
    ↓
Final Prompt → LLM Call

Phase Details

Phase 0: Base Configuration

# Model Context Windows (2026-02-04)
# - MiniMax-M2.1: 204,000 tokens (default)
# - Claude 3.5 Sonnet: 200,000 tokens
# - GPT-4o: 128,000 tokens

MAX_TOKENS = 204000  # Set to your model's context limit
SAFETY_MARGIN = 0.75 * MAX_TOKENS  # Conservative: 75% threshold = 153,000 tokens
MEMORY_TOP_K = 3                     # Max 3 memories
MEMORY_SUMMARY_MAX = 3 lines        # Max 3 lines per memory

Design Philosophy:

Leave 25% buffer for safety (model overhead, estimation errors, spikes)
Better to underutilize capacity than to overflow

Phase 1: Minimal Context

System prompt
Recent N messages (N=3, trimmed)
Current user input
No memory by default

Phase 2: Memory Need Decision

def need_memory(user_input):
    triggers = [
        "previously",
        "earlier we discussed",
        "do you remember",
        "as I mentioned before",
        "continuing from",
        "before we",
        "last time",
        "previously mentioned"
    ]
    for trigger in triggers:
        if trigger.lower() in user_input.lower():
            return True
    return False

Phase 3: Memory Retrieval (Optional)

memories = memory_search(query=user_input, top_k=MEMORY_TOP_K)
for mem in memories:
    summarized_memories.append(summarize(mem, max_lines=MEMORY_SUMMARY_MAX))

Phase 4: Token Estimation

Calculate estimated tokens for base_context + summarized_memories.

Phase 5: Safety Valve (Critical)

if estimated_tokens > SAFETY_MARGIN:
    base_context.append("[System Notice] Relevant memory skipped due to token budget.")
    return assemble(base_context)

Hard Rules:

❌ Never downgrade system prompt
❌ Never truncate user input
❌ No "lucky splicing"
✅ Only memory layer is expendable

Phase 6: Final Assembly

final_prompt = assemble(base_context + summarized_memories)
return final_prompt

Memory Data Standards

Allowed in Long-Term Memory

✅ User preferences / identity / long-term goals
✅ Confirmed important conclusions
✅ System-level settings and rules

Forbidden in Long-Term Memory

❌ Raw conversation logs
❌ Reasoning traces
❌ Temporary discussions
❌ Information recoverable from chat history

Quick Start

Copy

scripts/prompt_assemble.py

to your agent and use:

from prompt_assemble import build_prompt

# In your agent's prompt construction:
final_prompt = build_prompt(user_input, memory_search_fn, get_recent_dialog_fn)

Resources

scripts/

```
prompt_assemble.py
```
- Complete implementation with all phases (PromptAssembler class)

references/

```
memory_standards.md
```
- Detailed memory content guidelines
```
token_estimation.md
```
- Token counting strategies