Skills dspy

install

source · Clone the upstream repo

git clone https://github.com/TerminalSkills/skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/TerminalSkills/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/dspy" ~/.claude/skills/terminalskills-skills-dspy && rm -rf "$T"

manifest: skills/dspy/SKILL.md

safety · automated scan (low risk)

This is a pattern-based risk scan, not a security review. Our crawler flagged:

pip install

Always read a skill's source content before installing. Patterns alone don't mean the skill is malicious — but they warrant attention.

source content

DSPy — Programming (Not Prompting) LLMs

You are an expert in DSPy, the Stanford framework that replaces prompt engineering with programming. You help developers define LLM tasks as typed signatures, compose them into modules, and automatically optimize prompts/few-shot examples using teleprompters — so instead of manually crafting prompts, you write Python code and DSPy finds the best prompts for your task.

Core Capabilities

Signatures and Modules

import dspy

lm = dspy.LM("openai/gpt-4o-mini")
dspy.configure(lm=lm)

# Define task as a signature (not a prompt)
class SentimentAnalysis(dspy.Signature):
    """Classify the sentiment of a review."""
    review: str = dspy.InputField()
    sentiment: str = dspy.OutputField(desc="positive, negative, or neutral")
    confidence: float = dspy.OutputField(desc="0.0 to 1.0")

# Use it
classify = dspy.Predict(SentimentAnalysis)
result = classify(review="Great product, fast shipping!")
print(result.sentiment)     # "positive"
print(result.confidence)    # 0.95

# Chain of Thought (automatic reasoning)
classify_cot = dspy.ChainOfThought(SentimentAnalysis)
result = classify_cot(review="It works but the manual is confusing")
print(result.reasoning)     # Shows step-by-step reasoning
print(result.sentiment)     # "neutral"

Composable Modules

class RAGModule(dspy.Module):
    def __init__(self, num_passages=3):
        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate = dspy.ChainOfThought("context, question -> answer")

    def forward(self, question):
        context = self.retrieve(question).passages
        return self.generate(context=context, question=question)

rag = RAGModule()
answer = rag(question="What is DSPy?")

# Multi-hop reasoning
class MultiHop(dspy.Module):
    def __init__(self):
        self.generate_query = dspy.ChainOfThought("context, question -> search_query")
        self.retrieve = dspy.Retrieve(k=3)
        self.generate_answer = dspy.ChainOfThought("context, question -> answer")

    def forward(self, question):
        context = []
        for _ in range(2):  # 2 hops
            query = self.generate_query(context=context, question=question).search_query
            passages = self.retrieve(query).passages
            context = deduplicate(context + passages)
        return self.generate_answer(context=context, question=question)

Automatic Optimization

from dspy.teleprompt import BootstrapFewShot

# Training data
trainset = [
    dspy.Example(question="What is Python?", answer="A programming language").with_inputs("question"),
    dspy.Example(question="Who created Linux?", answer="Linus Torvalds").with_inputs("question"),
]

# Metric
def accuracy(example, prediction, trace=None):
    return example.answer.lower() in prediction.answer.lower()

# Optimize — finds best few-shot examples and instructions
teleprompter = BootstrapFewShot(metric=accuracy, max_bootstrapped_demos=4)
optimized_rag = teleprompter.compile(RAGModule(), trainset=trainset)

# optimized_rag now has automatically selected few-shot examples
# that maximize accuracy — no manual prompt engineering

Installation

pip install dspy

Best Practices

Signatures over prompts — Define typed inputs/outputs; DSPy generates and optimizes prompts automatically
ChainOfThought — Use for complex tasks; adds reasoning step that improves accuracy significantly
Modules — Compose LLM calls like neural network layers; chain retrieval + reasoning + generation
Teleprompters — Use BootstrapFewShot to automatically find optimal few-shot examples from training data
Typed outputs — OutputField descriptions constrain generation; more reliable than free-form prompts
Evaluation-driven — Define metrics first, then optimize; DSPy finds prompts that maximize your metric
Model-agnostic — Same code works with GPT-4, Claude, Llama, Gemini; optimization adapts per model
Assertions — Use
```
dspy.Assert
```
and
```
dspy.Suggest
```
for runtime output validation and self-correction