Claude-skill-registry dspy-simba-optimizer
This skill should be used when the user asks to "optimize with SIMBA", "use Bayesian optimization", "optimize agents with custom feedback", mentions "SIMBA optimizer", "mini-batch optimization", "statistical optimization", "lightweight optimizer", or needs an alternative to MIPROv2/GEPA for programs with rich feedback signals.
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/dspy-simba-optimizer" ~/.claude/skills/majiayu000-claude-skill-registry-dspy-simba-optimizer && rm -rf "$T"
manifest:
skills/data/dspy-simba-optimizer/SKILL.mdsource content
DSPy SIMBA Optimizer
Goal
Optimize DSPy programs using mini-batch Bayesian optimization with statistical analysis of feedback signals.
When to Use
- Need lighter-weight alternative to GEPA
- Have custom feedback metrics (not just accuracy)
- Agentic tasks with rich failure signals
- Budget-conscious optimization (fewer eval calls)
- Programs where few-shot examples aren't critical
Related Skills
- Alternative optimizers: dspy-miprov2-optimizer, dspy-gepa-reflective
- Agent optimization: dspy-react-agent-builder
- Evaluation: dspy-evaluation-suite
Inputs
| Input | Type | Description |
|---|---|---|
| | Program to optimize |
| | Training examples |
| | Returns float or |
| | Number of optimization steps |
| | Mini-batch size |
Outputs
| Output | Type | Description |
|---|---|---|
| | SIMBA-optimized program |
Workflow
Phase 1: Understand SIMBA
SIMBA (Stochastic Introspective Mini-Batch Ascent):
- Iterative prompt optimization with mini-batch sampling
- Identifies challenging examples with high output variability
- Generates self-reflective rules or adds successful demonstrations
- Lighter than GEPA (no reflection LM)
- More flexible than Bootstrap (uses feedback)
Comparison:
- MIPROv2: Best accuracy, lots of data
- GEPA: Agentic systems, expensive
- SIMBA: Custom feedback, budget-friendly
- Bootstrap: Simplest, demo-based
Phase 2: Basic SIMBA Optimization
import dspy dspy.configure(lm=dspy.LM("openai/gpt-4o-mini")) # Program to optimize class QAPipeline(dspy.Module): def __init__(self): self.generate = dspy.ChainOfThought("question -> answer") def forward(self, question): return self.generate(question=question) # Metric (can return just score or (score, feedback)) def qa_metric(example, pred, trace=None): correct = example.answer.lower() in pred.answer.lower() return 1.0 if correct else 0.0 # SIMBA optimizer optimizer = dspy.SIMBA( metric=qa_metric, max_steps=10, # Optimization iterations bsize=5 # Mini-batch size ) program = QAPipeline() compiled = optimizer.compile(program, trainset=trainset) compiled.save("qa_simba.json")
Phase 3: SIMBA with Feedback Signals
SIMBA works best with rich feedback:
import dspy def detailed_metric(example, pred, trace=None): """Metric with feedback signal.""" expected = example.answer.lower() actual = pred.answer.lower() if expected == actual: return dspy.Prediction(score=1.0, feedback="Perfect match") elif expected in actual: return dspy.Prediction(score=0.7, feedback=f"Contains answer but verbose: '{actual}'") else: overlap = len(set(expected.split()) & set(actual.split())) if overlap > 0: return dspy.Prediction(score=0.3, feedback=f"Partial overlap: {overlap} words") return dspy.Prediction(score=0.0, feedback=f"No match. Expected '{expected}'") optimizer = dspy.SIMBA( metric=detailed_metric, max_steps=20, # Optimization iterations bsize=8 # Mini-batch size ) compiled = optimizer.compile(program, trainset=trainset)
Phase 4: Production Agent Optimization
import dspy from dspy.evaluate import Evaluate import logging logger = logging.getLogger(__name__) # Define tools as functions def search(query: str) -> str: """Search knowledge base for relevant information.""" retriever = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts') results = retriever(query, k=3) return "\n".join([r['text'] for r in results]) def calculate(expr: str) -> str: """Evaluate Python expressions safely.""" try: with dspy.PythonInterpreter() as interp: return str(interp.execute(expr)) except Exception as e: return f"Error: {e}" class ResearchAgent(dspy.Module): def __init__(self): self.agent = dspy.ReAct( "question -> answer", tools=[search, calculate] ) def forward(self, question): return self.agent(question=question) def agent_metric(example, pred, trace=None): """Rich metric for agent optimization.""" expected = example.answer.lower().strip() actual = pred.answer.lower().strip() if pred.answer else "" # Exact match if expected == actual: return dspy.Prediction(score=1.0, feedback="Correct answer") # Partial match if expected in actual: return dspy.Prediction(score=0.7, feedback="Answer contains expected result") # Check key terms expected_terms = set(expected.split()) actual_terms = set(actual.split()) overlap = len(expected_terms & actual_terms) if overlap >= len(expected_terms) * 0.5: return dspy.Prediction(score=0.5, feedback=f"50%+ term overlap") return dspy.Prediction(score=0.0, feedback=f"Incorrect: expected '{example.answer}'") def optimize_agent(trainset, devset): """Full SIMBA optimization pipeline.""" dspy.configure(lm=dspy.LM("openai/gpt-4o-mini")) agent = ResearchAgent() # Baseline evaluation eval_metric = lambda ex, pred, trace: agent_metric(ex, pred, trace).score evaluator = dspy.Evaluate(devset=devset, metric=eval_metric, num_threads=4) baseline = evaluator(agent) logger.info(f"Baseline: {baseline:.2%}") # SIMBA optimization optimizer = dspy.SIMBA( metric=agent_metric, max_steps=25, # Optimization iterations bsize=6 # Mini-batch size ) compiled = optimizer.compile(agent, trainset=trainset) # Evaluate optimized optimized = evaluator(compiled) logger.info(f"SIMBA optimized: {optimized:.2%}") compiled.save("research_agent_simba.json") return compiled
Configuration
optimizer = dspy.SIMBA( metric=metric_fn, max_steps=20, # Optimization iterations bsize=32, # Mini-batch size (default: 32) num_candidates=6, # Candidates per iteration (default: 6) max_demos=4, # Max demos per predictor (default: 4) temperature_for_sampling=0.2, # Sampling temperature (default: 0.2) temperature_for_candidates=0.2 # Candidate selection temperature (default: 0.2) )
Best Practices
- Use feedback signals - SIMBA benefits from
objectsdspy.Prediction(score=..., feedback=...) - Balance parameters - Adjust
(default 32) andbsize
(default 8) based on dataset sizemax_steps - Patience - SIMBA is slower than Bootstrap, faster than GEPA
- Custom metrics - Best for scenarios with nuanced scoring (not binary)
- Tune temperatures - Lower temperatures (0.1-0.3) for exploitation, higher (0.5-1.0) for exploration
Limitations
- Newer optimizer, less battle-tested than MIPROv2
- Requires thoughtful metric design (garbage in, garbage out)
- Not as thorough as GEPA for agent optimization
- Mini-batch sampling adds variance to results
- No automatic prompt reflection like GEPA
Official Documentation
- DSPy Documentation: https://dspy.ai/
- DSPy GitHub: https://github.com/stanfordnlp/dspy
- SIMBA Optimizer: https://dspy.ai/api/optimizers/SIMBA/
- Optimizers Guide: https://dspy.ai/learn/optimization/optimizers/