Skills prompt-assemble
Token-safe prompt assembly with memory orchestration. Use for any agent that needs to construct LLM prompts with memory retrieval. Guarantees no API failure due to token overflow. Implements two-phase context construction, memory safety valve, and hard limits on memory injection.
install
source · Clone the upstream repo
git clone https://github.com/openclaw/skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/alexunitario-sketch/prompt-assemble" ~/.claude/skills/clawdbot-skills-prompt-assemble && rm -rf "$T"
manifest:
skills/alexunitario-sketch/prompt-assemble/SKILL.mdsource content
Prompt Assemble
Overview
A standardized, token-safe prompt assembly framework that guarantees API stability. Implements Two-Phase Context Construction and Memory Safety Valve to prevent token overflow while maximizing relevant context.
Design Goals:
- ✅ Never fail due to memory-related token overflow
- ✅ Memory is always discardable enhancement, never rigid dependency
- ✅ Token budget decisions centralized at prompt assemble layer
When to Use
Use this skill when:
- Building or modifying any agent that constructs prompts
- Implementing memory retrieval systems
- Adding new prompt-related logic to existing agents
- Any scenario where token budget safety is required
Core Workflow
User Input ↓ Need-Memory Decision ↓ Minimal Context Build ↓ Memory Retrieval (Optional) ↓ Memory Summarization ↓ Token Estimation ↓ Safety Valve Decision ↓ Final Prompt → LLM Call
Phase Details
Phase 0: Base Configuration
# Model Context Windows (2026-02-04) # - MiniMax-M2.1: 204,000 tokens (default) # - Claude 3.5 Sonnet: 200,000 tokens # - GPT-4o: 128,000 tokens MAX_TOKENS = 204000 # Set to your model's context limit SAFETY_MARGIN = 0.75 * MAX_TOKENS # Conservative: 75% threshold = 153,000 tokens MEMORY_TOP_K = 3 # Max 3 memories MEMORY_SUMMARY_MAX = 3 lines # Max 3 lines per memory
Design Philosophy:
- Leave 25% buffer for safety (model overhead, estimation errors, spikes)
- Better to underutilize capacity than to overflow
Phase 1: Minimal Context
- System prompt
- Recent N messages (N=3, trimmed)
- Current user input
- No memory by default
Phase 2: Memory Need Decision
def need_memory(user_input): triggers = [ "previously", "earlier we discussed", "do you remember", "as I mentioned before", "continuing from", "before we", "last time", "previously mentioned" ] for trigger in triggers: if trigger.lower() in user_input.lower(): return True return False
Phase 3: Memory Retrieval (Optional)
memories = memory_search(query=user_input, top_k=MEMORY_TOP_K) for mem in memories: summarized_memories.append(summarize(mem, max_lines=MEMORY_SUMMARY_MAX))
Phase 4: Token Estimation
Calculate estimated tokens for base_context + summarized_memories.
Phase 5: Safety Valve (Critical)
if estimated_tokens > SAFETY_MARGIN: base_context.append("[System Notice] Relevant memory skipped due to token budget.") return assemble(base_context)
Hard Rules:
- ❌ Never downgrade system prompt
- ❌ Never truncate user input
- ❌ No "lucky splicing"
- ✅ Only memory layer is expendable
Phase 6: Final Assembly
final_prompt = assemble(base_context + summarized_memories) return final_prompt
Memory Data Standards
Allowed in Long-Term Memory
- ✅ User preferences / identity / long-term goals
- ✅ Confirmed important conclusions
- ✅ System-level settings and rules
Forbidden in Long-Term Memory
- ❌ Raw conversation logs
- ❌ Reasoning traces
- ❌ Temporary discussions
- ❌ Information recoverable from chat history
Quick Start
Copy
scripts/prompt_assemble.py to your agent and use:
from prompt_assemble import build_prompt # In your agent's prompt construction: final_prompt = build_prompt(user_input, memory_search_fn, get_recent_dialog_fn)
Resources
scripts/
- Complete implementation with all phases (PromptAssembler class)prompt_assemble.py
references/
- Detailed memory content guidelinesmemory_standards.md
- Token counting strategiestoken_estimation.md