Vibeship-spawner-skills llm-fine-tuning

id: llm-fine-tuning

install

source · Clone the upstream repo

git clone https://github.com/vibeforge1111/vibeship-spawner-skills

manifest: ai/llm-fine-tuning/skill.yaml

tags

#llm-fine-tuning #lora #qlora #peft #instruction-tuning #model-adaptation

source content

id: llm-fine-tuning name: LLM Fine-Tuning category: ai description: Use when adapting large language models to specific tasks, domains, or behaviors - covers LoRA, QLoRA, PEFT, instruction tuning, and full fine-tuning strategies

patterns: golden_rules: - rule: "Data quality > data quantity" reason: "1000 high-quality examples beats 10000 noisy ones" - rule: "Format consistency is critical" reason: "Model learns your format, inconsistency confuses it" - rule: "Start with LoRA, not full fine-tuning" reason: "99% memory reduction, comparable results" - rule: "Mix general data (20-30%)" reason: "Prevents catastrophic forgetting" - rule: "Evaluate on held-out set" reason: "Training loss alone is misleading" - rule: "Use the right base model" reason: "Instruction-tuned base for chat, base model for continued pretraining"

fine_tuning_methods: full_fine_tuning: description: "Train all parameters" when: - "Unlimited compute" - "Major capability shift" - "Custom architecture" memory: "16x model size (7B = 112GB+)" lora: description: "Low-rank adapters on frozen weights" when: - "Model fits in GPU" - "Want maximum quality" - "Fast training" memory: "7B model ~28GB" params_trained: "0.1-1% of total" qlora: description: "LoRA + 4-bit quantization" when: - "Model too large for GPU" - "Have 24GB VRAM or less" - "Willing to trade speed" memory: "7B model ~14GB, 70B model ~48GB"

lora_config: rank: description: "Dimension of low-rank matrices" values: "8-64 typical, higher for harder tasks" alpha: description: "Scaling factor" rule: "Usually 2x rank" target_modules: high_impact: ["q_proj", "v_proj"] medium_impact: ["k_proj", "o_proj"] low_impact: ["gate_proj", "up_proj", "down_proj"] dropout: "0.05-0.1"

hyperparameters_by_size: 7b: r: 16 lora_alpha: 32 learning_rate: 2e-4 batch_size: 4 gradient_accumulation: 4 13b: r: 32 lora_alpha: 64 learning_rate: 1e-4 batch_size: 2 gradient_accumulation: 8 70b: r: 64 lora_alpha: 128 learning_rate: 5e-5 batch_size: 1 gradient_accumulation: 16

anti_patterns:

pattern: "Inconsistent formatting" problem: "Model learns noise" solution: "Strict format templates"
pattern: "Too high learning rate" problem: "Divergence, forgetting" solution: "1e-4 to 3e-4 for LoRA"
pattern: "No eval set" problem: "Overfitting undetected" solution: "Hold out 10% for validation"
pattern: "Only task data" problem: "Catastrophic forgetting" solution: "Mix 20-30% general data"
pattern: "Wrong target modules" problem: "Poor adaptation" solution: "Include attention layers at minimum"
pattern: "Rank too low" problem: "Underfitting" solution: "Start with r=16-32"
pattern: "Rank too high" problem: "Overfitting, slow" solution: "Diminishing returns above r=64"

implementation_checklist: before_training: - "Base model selected (instruction-tuned for chat, base for pretraining)" - "Data cleaned and formatted consistently" - "Train/eval split created" - "Hardware requirements calculated" lora_configuration: - "Rank selected (16-64 typically)" - "Alpha = 2 × rank" - "Target modules include attention projections" - "Dropout 0.05-0.1" training: - "Learning rate 1e-4 to 3e-4" - "Gradient checkpointing enabled" - "Mixed precision (bf16) enabled" - "Gradient accumulation for effective batch size ≥ 16" - "Warmup 3-5% of steps" evaluation: - "Task-specific metrics tracked" - "General capability evaluated (no regression)" - "Multiple prompts tested" - "Human evaluation for subjective quality"

handoffs:

skill: distributed-training trigger: "multi-GPU fine-tuning setup"
skill: model-optimization trigger: "deployment optimization after fine-tuning"
skill: transformer-architecture trigger: "base model architecture questions"
skill: reinforcement-learning trigger: "RLHF after fine-tuning"

ecosystem: libraries: - "PEFT - HuggingFace parameter-efficient fine-tuning" - "TRL - Transformer Reinforcement Learning" - "Axolotl - Fine-tuning framework" - "LitGPT - Lightning AI fine-tuning" techniques: - "LoRA - Low-Rank Adaptation" - "QLoRA - Quantized LoRA" - "DoRA - Weight-Decomposed LoRA" - "PiSSA - Principal Singular Values" datasets: - "Alpaca - Instruction following" - "ShareGPT - Conversations" - "UltraChat - High-quality dialogues" - "OpenOrca - Reasoning"

sources: papers: - "LoRA: Low-Rank Adaptation of Large Language Models (Hu et al. 2021)" - "QLoRA: Efficient Finetuning of Quantized LLMs (Dettmers et al. 2023)" tutorials: - "HuggingFace PEFT Guide" - "Practical Tips for Finetuning LLMs - Sebastian Raschka"