Awesome-omni-skill training-hub
Fine-tune LLMs using Red Hat training-hub library with SFT, LoRA, and OSFT algorithms. Use when preparing JSONL datasets, running training jobs, configuring hardware, scaling to clusters, evaluating models, or deploying with vLLM.
git clone https://github.com/diegosouzapw/awesome-omni-skill
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/machine-learning/training-hub" ~/.claude/skills/diegosouzapw-awesome-omni-skill-training-hub-c4d6db && rm -rf "$T"
skills/machine-learning/training-hub/SKILL.mdTraining Hub
Red Hat's unified library for LLM post-training: SFT, LoRA, and OSFT (continual learning).
Quick Reference
| Task | Command |
|---|---|
| Recommend config | |
| Estimate memory | |
| Validate dataset | |
| Full fine-tuning | |
| LoRA training | |
| OSFT (continual) | |
Installation
pip install training-hub # Basic pip install training-hub[lora] # LoRA with Unsloth (2x faster) pip install training-hub[cuda] --no-build-isolation # CUDA support
Get Started Fast
# Get optimal config for your hardware python scripts/recommend_config.py \ --model meta-llama/Llama-3.1-8B-Instruct \ --hardware rtx-5090
Data Format
Training data must be JSONL with message structure:
{"messages": [{"role": "user", "content": "Q"}, {"role": "assistant", "content": "A"}]}
Validate before training:
python scripts/validate_dataset.py ./training_data.jsonl
For data preparation details, see DATA-FORMATS.md.
Training Methods
Supervised Fine-Tuning (SFT)
Full-parameter fine-tuning. Requires significant VRAM.
from training_hub import sft result = sft( model_path="Qwen/Qwen2.5-7B-Instruct", data_path="./training_data.jsonl", ckpt_output_dir="./checkpoints", num_epochs=3, effective_batch_size=8, learning_rate=2e-5, max_seq_len=2048, max_tokens_per_gpu=45000, )
LoRA Fine-Tuning
Memory-efficient adaptation (up to 2x faster, 70% less VRAM):
from training_hub import lora_sft result = lora_sft( model_path="Qwen/Qwen2.5-7B-Instruct", data_path="./training_data.jsonl", ckpt_output_dir="./outputs", lora_r=16, lora_alpha=32, num_epochs=3, learning_rate=2e-4, )
QLoRA (4-bit): Add
load_in_4bit=True for large models on limited VRAM.
OSFT (Continual Learning)
Adapt without catastrophic forgetting:
from training_hub import osft result = osft( model_path="meta-llama/Llama-3.1-8B-Instruct", data_path="./domain_data.jsonl", ckpt_output_dir="./checkpoints", unfreeze_rank_ratio=0.25, effective_batch_size=16, learning_rate=2e-5, )
For all parameters, see ALGORITHMS.md.
Hardware Support
| Hardware | VRAM | Best For |
|---|---|---|
| RTX 5090 | 32GB | 8B LoRA, 70B QLoRA |
| DGX Spark | 128GB | 70B SFT |
| H100 | 80GB | 14B SFT, 70B LoRA |
| 8×H100 | 640GB | 70B SFT |
# Check if your config fits python scripts/estimate_memory.py \ --model meta-llama/Llama-3.1-70B-Instruct \ --method lora \ --hardware h100 \ --num-gpus 8
For hardware-specific configs, see HARDWARE.md.
Scaling
Multi-GPU:
result = sft(..., nproc_per_node=8)
Multi-node:
result = sft(..., nnodes=2, node_rank=0, nproc_per_node=8, rdzv_endpoint="0.0.0.0:29500")
For Slurm, Kubernetes, and datacenter deployments, see SCALE.md.
Algorithm Selection
| Scenario | Method |
|---|---|
| First-time fine-tuning, large dataset | SFT |
| Memory constrained | LoRA |
| Very large model (70B+), limited VRAM | LoRA + QLoRA |
| Preserve existing capabilities | OSFT |
| Domain adaptation, small dataset | OSFT |
Documentation
| Topic | File |
|---|---|
| Hardware profiles & configs | HARDWARE.md |
| All algorithm parameters | ALGORITHMS.md |
| Data formats & conversion | DATA-FORMATS.md |
| Datacenter & cluster setup | SCALE.md |
| Model evaluation | EVALUATION.md |
| vLLM inference & serving | INFERENCE.md |
| Advanced techniques | ADVANCED.md |
| Model-specific configs | MODELS.md |
| Troubleshooting | TROUBLESHOOTING.md |
| Distributed training | DISTRIBUTED.md |
Utility Scripts
| Script | Purpose |
|---|---|
| Generate optimal config for model + hardware |
| Estimate GPU memory requirements |
| Validate JSONL dataset format |
| Convert CSV, Alpaca, ShareGPT to JSONL |
Troubleshooting
CUDA OOM: Reduce
max_tokens_per_gpu, use LoRA + QLoRA, or add GPUs
Dataset errors: Run
python scripts/validate_dataset.py first
LoRA multi-GPU: Requires
torchrun --nproc-per-node=N script.py
Training diverges: Lower
learning_rate (try 1e-5 for SFT, 1e-4 for LoRA)
For more, see TROUBLESHOOTING.md.