Hacktricks-skills lora-fine-tuning
How to implement LoRA (Low-Rank Adaptation) for efficient fine-tuning of large language models. Use this skill whenever the user wants to fine-tune an LLM, reduce training memory/compute requirements, implement parameter-efficient fine-tuning, or adapt a pre-trained model to a new task without retraining all parameters. Make sure to use this skill when users mention fine-tuning, LoRA, PEFT, parameter-efficient training, or want to train on limited hardware.
git clone https://github.com/abelrguezr/hacktricks-skills
skills/AI/AI-llm-architecture/7.0.-lora-improvements-in-fine-tuning/SKILL.MDLoRA Fine-Tuning Implementation
This skill helps you implement LoRA (Low-Rank Adaptation) for efficient fine-tuning of large language models. LoRA reduces training costs by only updating small adapter matrices instead of the entire model.
When to Use LoRA
Use LoRA when you need to:
- Fine-tune a large model on limited hardware (less GPU memory)
- Adapt a pre-trained model to a new task without full retraining
- Store multiple task-specific adaptations efficiently
- Reduce training time and computational costs
How LoRA Works
LoRA replaces standard linear layers with a combination of:
- Original frozen weights (preserved from pre-training)
- Two small trainable matrices (A and B) that approximate weight updates
The forward pass becomes:
output = original_linear(x) + alpha * (x @ A @ B)
Key Benefits
- Fewer trainable parameters: Only matrices A and B are updated
- Preserved knowledge: Original model weights stay frozen
- Storage efficiency: Save only small LoRA matrices per task
- Faster training: Less computation per gradient update
Implementation
Step 1: Define LoRA Components
Use the
scripts/lora_layers.py module which provides:
: The low-rank adapter with matrices A and BLoRALayer
: Wrapper combining original linear layer with LoRALinearWithLoRA
: Recursive function to convert all linear layersreplace_linear_with_lora()
Step 2: Apply LoRA to Your Model
from scripts.lora_layers import replace_linear_with_lora # Choose rank and alpha (typical values) rank = 8 # Lower rank = fewer parameters, higher compression alpha = 16 # Scaling factor (often 2x rank) # Apply LoRA to your model model = replace_linear_with_lora(model, rank=rank, alpha=alpha)
Step 3: Configure Training
Only the LoRA parameters need gradients:
# Freeze original model parameters for param in model.parameters(): param.requires_grad = False # Enable gradients only for LoRA matrices for name, param in model.named_parameters(): if 'lora' in name: param.requires_grad = True
Step 4: Train as Usual
Your training loop remains the same. The optimizer will only update LoRA parameters.
Parameter Selection Guide
| Rank | Use Case | Trainable Params (approx) |
|---|---|---|
| 4-8 | Small tasks, limited data | ~0.1% of model |
| 8-16 | Standard fine-tuning | ~0.2-0.4% of model |
| 16-32 | Complex tasks, more data | ~0.4-0.8% of model |
Alpha: Typically set to 2x rank (e.g., rank=8, alpha=16)
Example: Fine-Tuning a Transformer
import torch from transformers import AutoModelForCausalLM from scripts.lora_layers import replace_linear_with_lora # Load pre-trained model model = AutoModelForCausalLM.from_pretrained("your-model") # Apply LoRA rank = 8 alpha = 16 model = replace_linear_with_lora(model, rank, alpha) # Freeze and enable LoRA gradients for param in model.parameters(): param.requires_grad = False for name, param in model.named_parameters(): if 'lora' in name: param.requires_grad = True # Count trainable parameters total_params = sum(p.numel() for p in model.parameters()) trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad) print(f"Trainable: {trainable_params:,} ({100*trainable_params/total_params:.2f}%)") # Train normally optimizer = torch.optim.AdamW(filter(lambda p: p.requires_grad, model.parameters()), lr=1e-4)
Merging LoRA Weights (Optional)
After training, you can merge LoRA weights back into the original model for inference:
def merge_lora_weights(model): for name, module in model.named_modules(): if isinstance(module, LinearWithLoRA): # Merge: W_merged = W_original + alpha * B @ A with torch.no_grad(): merged_weight = module.linear.weight + module.lora.alpha * (module.lora.B @ module.lora.A) module.linear.weight = torch.nn.Parameter(merged_weight) # Replace with plain linear setattr(model, name.split('.')[-1], module.linear)
Common Issues and Solutions
Issue: Out of memory during training
- Solution: Reduce rank (try 4 or 8), use gradient accumulation, or enable mixed precision
Issue: Poor fine-tuning quality
- Solution: Increase rank, check learning rate, ensure sufficient training data
Issue: LoRA not being applied
- Solution: Verify
was called before training, check that LoRA parameters havereplace_linear_with_lorarequires_grad=True