Claude-skill-registry dspy-finetune-bootstrap
This skill should be used when the user asks to "fine-tune a DSPy model", "distill a program into weights", "use BootstrapFinetune", "create a student model", "reduce inference costs with fine-tuning", mentions "model distillation", "teacher-student training", or wants to deploy a DSPy program as fine-tuned weights for production efficiency.
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/dspy-finetune-bootstrap" ~/.claude/skills/majiayu000-claude-skill-registry-dspy-finetune-bootstrap && rm -rf "$T"
manifest:
skills/data/dspy-finetune-bootstrap/SKILL.mdsource content
DSPy BootstrapFinetune Optimizer
Goal
Distill a DSPy program into fine-tuned model weights for efficient production deployment.
When to Use
- You have a working DSPy program with a large model
- Need to reduce inference costs
- Want faster responses (smaller model)
- Deploying to resource-constrained environments
Inputs
| Input | Type | Description |
|---|---|---|
| | Teacher program to distill |
| | Training examples |
| | Validation metric (optional) |
| | Training hyperparameters |
Outputs
| Output | Type | Description |
|---|---|---|
| | Program with fine-tuned weights |
| | Path to saved model |
Workflow
Phase 1: Prepare Teacher Program
import dspy # Configure with strong teacher model dspy.configure(lm=dspy.LM("openai/gpt-4o")) class TeacherQA(dspy.Module): def __init__(self): self.cot = dspy.ChainOfThought("question -> answer") def forward(self, question): return self.cot(question=question)
Phase 2: Enable Experimental Features & Generate Training Traces
BootstrapFinetune is experimental and requires enabling the flag:
import dspy from dspy.teleprompt import BootstrapFinetune # Enable experimental features dspy.settings.experimental = True optimizer = BootstrapFinetune( metric=lambda gold, pred, trace=None: gold.answer.lower() in pred.answer.lower(), train_kwargs={ 'learning_rate': 5e-5, 'num_train_epochs': 3, 'per_device_train_batch_size': 4, 'warmup_ratio': 0.1 } )
Phase 3: Fine-tune Student Model
finetuned = optimizer.compile( TeacherQA(), trainset=trainset )
Phase 4: Deploy
# Save the fine-tuned model (saves state-only by default) finetuned.save("finetuned_qa_model.json") # Load and use (must recreate architecture first) loaded = TeacherQA() loaded.load("finetuned_qa_model.json") result = loaded(question="What is machine learning?")
Production Example
import dspy from dspy.teleprompt import BootstrapFinetune from dspy.evaluate import Evaluate import logging import os # Enable experimental features dspy.settings.experimental = True logger = logging.getLogger(__name__) class ClassificationSignature(dspy.Signature): """Classify text into categories.""" text: str = dspy.InputField() label: str = dspy.OutputField(desc="Category: positive, negative, neutral") class TextClassifier(dspy.Module): def __init__(self): self.classify = dspy.Predict(ClassificationSignature) def forward(self, text): return self.classify(text=text) def classification_metric(gold, pred, trace=None): """Exact label match.""" gold_label = gold.label.lower().strip() pred_label = pred.label.lower().strip() if pred.label else "" return gold_label == pred_label def finetune_classifier(trainset, devset, output_dir="./finetuned_model"): """Full fine-tuning pipeline.""" # Configure teacher (strong model) dspy.configure(lm=dspy.LM("openai/gpt-4o")) teacher = TextClassifier() # Evaluate teacher evaluator = Evaluate(devset=devset, metric=classification_metric, num_threads=8) teacher_score = evaluator(teacher) logger.info(f"Teacher score: {teacher_score:.2%}") # Fine-tune (train_kwargs passed to constructor) optimizer = BootstrapFinetune( metric=classification_metric, train_kwargs={ 'learning_rate': 2e-5, 'num_train_epochs': 3, 'per_device_train_batch_size': 8, 'gradient_accumulation_steps': 2, 'warmup_ratio': 0.1, 'weight_decay': 0.01, 'logging_steps': 10, 'save_strategy': 'epoch', 'output_dir': output_dir } ) finetuned = optimizer.compile( teacher, trainset=trainset ) # Evaluate fine-tuned model student_score = evaluator(finetuned) logger.info(f"Student score: {student_score:.2%}") # Save (state-only as JSON) finetuned.save(os.path.join(output_dir, "final_model.json")) return { "teacher_score": teacher_score, "student_score": student_score, "model_path": os.path.join(output_dir, "final_model.json") } # For RAG fine-tuning class RAGClassifier(dspy.Module): """RAG pipeline that can be fine-tuned.""" def __init__(self, num_passages=3): self.retrieve = dspy.Retrieve(k=num_passages) self.classify = dspy.ChainOfThought("context, text -> label") def forward(self, text): context = self.retrieve(text).passages return self.classify(context=context, text=text) def finetune_rag_classifier(trainset, devset): """Fine-tune a RAG-based classifier.""" # Configure retriever and LM colbert = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts') dspy.configure( lm=dspy.LM("openai/gpt-4o"), rm=colbert ) rag = RAGClassifier() # Fine-tune (train_kwargs in constructor) optimizer = BootstrapFinetune( metric=classification_metric, train_kwargs={ 'learning_rate': 1e-5, 'num_train_epochs': 5 } ) finetuned = optimizer.compile( rag, trainset=trainset ) return finetuned
Training Arguments Reference
| Argument | Description | Typical Value |
|---|---|---|
| Learning rate | 1e-5 to 5e-5 |
| Training epochs | 3-5 |
| Batch size | 4-16 |
| Gradient accumulation | 2-8 |
| Warmup proportion | 0.1 |
| L2 regularization | 0.01 |
| Gradient clipping | 1.0 |
Best Practices
- Strong teacher - Use GPT-4 or Claude as teacher
- Quality data - Teacher traces are only as good as training examples
- Validate improvement - Compare student to teacher on held-out set
- Start with more epochs - Fine-tuning often needs 3-5 epochs
- Monitor overfitting - Track validation loss during training
Limitations
- Requires access to model weights (not API-only models)
- Training requires GPU resources
- Student may not match teacher quality on all inputs
- Fine-tuning takes hours/days depending on data size
- Model size reduction may cause capability loss
Official Documentation
- DSPy Documentation: https://dspy.ai/
- DSPy GitHub: https://github.com/stanfordnlp/dspy
- BootstrapFinetune API: https://dspy.ai/api/optimizers/BootstrapFinetune/
- Fine-tuning Guide: https://dspy.ai/tutorials/classification_finetuning/