Ai prompt-engineering
install
source · Clone the upstream repo
git clone https://github.com/wpank/ai
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/wpank/ai "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/writing/prompt-engineering" ~/.claude/skills/wpank-ai-prompt-engineering && rm -rf "$T"
manifest:
skills/writing/prompt-engineering/SKILL.mdsource content
Prompt Engineering
Master advanced prompt engineering techniques to maximize LLM performance, reliability, and controllability.
Installation
OpenClaw / Moltbot / Clawbot
npx clawhub@latest install prompt-engineering
When to Use
- Designing complex prompts for production LLM applications
- Optimizing prompt performance and consistency
- Implementing structured reasoning patterns (chain-of-thought, tree-of-thought)
- Building few-shot learning systems with dynamic example selection
- Creating reusable prompt templates with variable interpolation
- Debugging prompts that produce inconsistent outputs
- Implementing system prompts for specialized AI assistants
- Using structured outputs (JSON mode) for reliable parsing
Core Techniques
1. Few-Shot Learning
Provide examples that demonstrate the desired behavior:
- Semantic similarity — select examples closest to the input
- Diversity sampling — cover the range of expected inputs
- Balance count vs context — more examples aren't always better; respect the context window
- Dynamic retrieval — pull examples from a knowledge base at runtime
For patterns and implementation: See
references/few-shot-learning.md
2. Chain-of-Thought Prompting
Elicit step-by-step reasoning:
- Zero-shot CoT — append "Let's think step by step"
- Few-shot CoT — provide examples with reasoning traces
- Self-consistency — sample multiple reasoning paths, take the majority answer
- Verification steps — have the model check its own work
For patterns and implementation: See
references/chain-of-thought.md
3. Structured Outputs
Enforce reliable, parseable responses:
- JSON mode — request JSON and validate with Pydantic
- Schema enforcement — define the exact shape of expected output
- Type-safe handling — parse and validate before using
- Error recovery — fall back gracefully when output is malformed
4. System Prompt Design
Set model behavior, constraints, and expertise:
- Define the role and domain expertise
- Establish output format and structure
- Set constraints and safety guidelines
- Provide context and background information
For patterns and templates: See
references/system-prompts.md
5. Template Systems
Build reusable, composable prompts:
- Variable interpolation and formatting
- Conditional sections based on input
- Multi-turn conversation templates
- Modular prompt components
For a template library: See
references/prompt-templates.md
Key Patterns
Pattern 1: Structured Output with Validation
from pydantic import BaseModel, Field from typing import Literal class SentimentAnalysis(BaseModel): sentiment: Literal["positive", "negative", "neutral"] confidence: float = Field(ge=0, le=1) key_phrases: list[str] reasoning: str # Request JSON matching the schema, then validate: # result = SentimentAnalysis(**json.loads(response))
Pattern 2: Chain-of-Thought with Self-Verification
Solve this problem step by step. Problem: {problem} Instructions: 1. Break down the problem into clear steps 2. Work through each step showing your reasoning 3. State your final answer 4. Verify your answer by checking it against the original problem
Pattern 3: Progressive Disclosure
Start simple, add complexity only when needed:
PROMPT_LEVELS = { # Level 1: Direct instruction "simple": "Summarize this article: {text}", # Level 2: Add constraints "constrained": """Summarize in 3 bullet points: - Key findings - Main conclusions - Practical implications Article: {text}""", # Level 3: Add reasoning "reasoning": """Read this article carefully. 1. Identify the main topic and thesis 2. Extract the key supporting points 3. Summarize in 3 bullet points Article: {text}""", # Level 4: Add examples (few-shot) "few_shot": """[examples...] Now summarize: {text}""" }
Pattern 4: Error Recovery and Fallback
async def answer_with_fallback(context, question, llm): """Answer with structured output, fall back to simple on failure.""" try: response = await llm.ainvoke(structured_prompt) return ResponseSchema(**json.loads(response.content)) except (json.JSONDecodeError, ValidationError): simple_response = await llm.ainvoke(simple_prompt) return ResponseSchema( answer=simple_response.content, confidence=0.5, sources=["fallback extraction"] )
Pattern 5: Role-Based System Prompts
SYSTEM_PROMPTS = { "analyst": """You are a senior data analyst. - Write efficient, documented queries - Explain methodology - Translate findings into business impact""", "code_reviewer": """You are a senior software engineer. Review for: correctness, security, performance, maintainability. Output: summary, critical issues, suggestions, positive feedback.""" }
Pattern 6: RAG Integration
You answer questions based on provided context. Context (from knowledge base): {context} Rules: 1. Answer ONLY from the provided context 2. If the answer isn't in the context, say so 3. Cite passages using [1], [2] notation 4. Ask for clarification if the question is ambiguous Question: {question}
Performance Optimization
Token Efficiency
# Before: 150+ tokens "I would like you to please take the following text and provide me with a comprehensive summary of the main points..." # After: 30 tokens "Summarize the key points concisely:\n\n{text}\n\nSummary:"
Prompt Caching
# Cache repeated system prompts for cost savings response = client.messages.create( model="claude-sonnet-4-5", system=[{ "type": "text", "text": LONG_SYSTEM_PROMPT, "cache_control": {"type": "ephemeral"} }], messages=[{"role": "user", "content": query}] )
Best Practices
- Be specific — vague prompts produce inconsistent results
- Show, don't tell — examples are more effective than descriptions
- Use structured outputs — enforce schemas with Pydantic for reliability
- Test extensively — evaluate on diverse, representative inputs
- Iterate rapidly — small changes can have large impacts
- Monitor in production — track accuracy, latency, token usage, success rate
- Version control prompts — treat prompts as code with proper versioning
- Document intent — explain why prompts are structured as they are
Success Metrics
| Metric | What to Track |
|---|---|
| Accuracy | Correctness of outputs against ground truth |
| Consistency | Reproducibility across similar inputs |
| Latency | Response time at P50, P95, P99 |
| Token Usage | Average tokens per request (cost control) |
| Success Rate | Percentage of valid, parseable outputs |
| User Satisfaction | Ratings, feedback, task completion |
For optimization workflows: See
references/prompt-optimization.md
Resources
— CoT patterns and implementationreferences/chain-of-thought.md
— example selection and few-shot strategiesreferences/few-shot-learning.md
— iterative refinement workflowsreferences/prompt-optimization.md
— reusable template patternsreferences/prompt-templates.md
— system prompt design patternsreferences/system-prompts.md
— example few-shot datasetsassets/few-shot-examples.json
— ready-to-use prompt templatesassets/prompt-template-library.md
— prompt optimization utilityscripts/optimize-prompt.py
Common Pitfalls
| Pitfall | Problem | Fix |
|---|---|---|
| Over-engineering | Complex prompt when simple works | Start simple, add complexity only when needed |
| Example pollution | Examples don't match target task | Curate examples that reflect actual inputs |
| Context overflow | Too many examples exceed token limit | Monitor token usage; prioritize quality over quantity |
| Ambiguous instructions | Multiple valid interpretations | Be specific; test with different interpreters |
| No error handling | Assuming outputs are always well-formed | Add validation, fallbacks, and retry logic |
| Hardcoded values | Prompts can't be reused | Parameterize with template variables |
| No versioning | Can't track what changed or roll back | Version control prompts like code |
Integration Patterns
With Validation
Complete the following task: {task} After generating your response, verify it meets ALL these criteria: - Directly addresses the original request - Contains no factual errors - Is appropriately detailed - Uses proper formatting If verification fails, revise before responding.
With Confidence Scoring
Include confidence in structured outputs so downstream systems can handle uncertainty:
- High confidence (>0.8): use the answer directly
- Medium confidence (0.5-0.8): use with caveats
- Low confidence (<0.5): flag for human review
NEVER Do
- NEVER start with complex prompts — try the simplest prompt first; add complexity only when simple fails
- NEVER assume outputs will be well-formed — always validate and handle malformed responses
- NEVER hardcode prompts that should be parameterized — use templates with variables for reuse
- NEVER skip testing on edge cases — boundary inputs reveal prompt fragility
- NEVER use examples that don't match the target task — mismatched examples pollute the model's understanding
- NEVER exceed context limits with examples — monitor token usage; too many examples degrade performance
- NEVER deploy prompts without version control — prompt changes can break production; track every change
- NEVER ignore latency — a perfect prompt that takes 30 seconds is worse than a good prompt that takes 3 seconds