Claude-skill-registry funsloth-local
Training manager for local GPU training - validate CUDA, manage GPU selection, monitor progress, handle checkpoints
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/funsloth-local" ~/.claude/skills/majiayu000-claude-skill-registry-funsloth-local && rm -rf "$T"
manifest:
skills/data/funsloth-local/SKILL.mdsource content
Local GPU Training Manager
Run Unsloth training on your local GPU.
Prerequisites Check
1. Verify CUDA
import torch print(f"CUDA available: {torch.cuda.is_available()}") print(f"GPU: {torch.cuda.get_device_name(0)}") print(f"VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
If CUDA not available:
- Check NVIDIA drivers:
nvidia-smi - Check CUDA:
nvcc --version - Reinstall PyTorch:
pip install torch --index-url https://download.pytorch.org/whl/cu121
2. Check VRAM
See references/HARDWARE_GUIDE.md for requirements:
| VRAM | Recommended Setup |
|---|---|
| 8GB | 7B, 4-bit, batch=1, LoRA r=8 |
| 12GB | 7B, 4-bit, batch=2, LoRA r=16 |
| 16GB | 7-13B, 4-bit, batch=2, LoRA r=16-32 |
| 24GB | 7-14B, 4-bit, batch=4, LoRA r=32 |
3. Check Dependencies
pip install unsloth torch transformers trl peft datasets accelerate bitsandbytes
Docker Option
Use the official Unsloth Docker image for a pre-configured environment (supports all GPUs including Blackwell/50-series):
docker run -d \ -e JUPYTER_PASSWORD="unsloth" \ -p 8888:8888 \ -v $(pwd)/work:/workspace/work \ --gpus all \ unsloth/unsloth
Access Jupyter at
http://localhost:8888. Example notebooks are in /workspace/unsloth-notebooks/.
Environment variables:
- Jupyter auth (default:JUPYTER_PASSWORD
)unsloth
- Port (default:JUPYTER_PORT
)8888
- User/sudo password (default:USER_PASSWORD
)unsloth
Run Training
Option 1: Notebook
jupyter notebook notebooks/sft_template.ipynb
Option 2: Script
# Edit configuration in script, then run python scripts/train_sft.py
GPU Selection (Multi-GPU)
import os os.environ["CUDA_VISIBLE_DEVICES"] = "0" # Use first GPU
Monitor Training
Terminal
# Watch GPU usage watch -n 1 nvidia-smi # Or use nvitop (more detailed) pip install nvitop && nvitop
WandB (Optional)
export WANDB_API_KEY="your-key" # Add report_to="wandb" in TrainingArguments
Troubleshooting
OOM Error
Try in order:
- Reduce batch_size (to 1)
- Increase gradient_accumulation
- Reduce max_seq_length
- Reduce LoRA rank
torch.cuda.empty_cache()
Loss Not Decreasing
- Check learning rate (try higher or lower)
- Verify chat template matches model
- Inspect data format
Training Too Slow
- Enable bf16 if supported
- Use
for short sequencespacking=True - Reduce logging_steps
See references/TROUBLESHOOTING.md for more solutions.
Resume from Checkpoint
TrainingArguments( resume_from_checkpoint=True, # Auto-find latest # Or: resume_from_checkpoint="outputs/checkpoint-500" )
Save Model
Training script automatically saves:
- LoRA weightsoutputs/lora_adapter/
- Merged model (optional)outputs/merged_16bit/
Test Inference
from unsloth import FastLanguageModel model, tokenizer = FastLanguageModel.from_pretrained("outputs/lora_adapter") FastLanguageModel.for_inference(model) messages = [{"role": "user", "content": "Hello!"}] inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda") outputs = model.generate(inputs, max_new_tokens=100) print(tokenizer.decode(outputs[0]))
Handoff
Offer
funsloth-upload for Hub upload with model card.
Tips
- Close other GPU apps before training
- Monitor temps - keep under 85C
- Use UPS for long runs
- Save frequently with
save_steps
Bundled Resources
- notebooks/sft_template.ipynb - Notebook template
- scripts/train_sft.py - Script template
- references/HARDWARE_GUIDE.md - VRAM requirements
- references/TROUBLESHOOTING.md - Common issues