Skilllibrary fine-tuning

Guides full and parameter-efficient fine-tuning (LoRA, QLoRA) of LLMs using PEFT, TRL SFTTrainer, and BitsAndBytes. Covers adapter config, hyperparameter selection, data formatting, and evaluation. Use when adapting a pretrained model to a specific task or domain.

install

source · Clone the upstream repo

git clone https://github.com/merceralex397-collab/skilllibrary

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/merceralex397-collab/skilllibrary "$T" && mkdir -p ~/.claude/skills && cp -r "$T/12-ai-llm-training-architecture-and-research/fine-tuning" ~/.claude/skills/merceralex397-collab-skilllibrary-fine-tuning && rm -rf "$T"

manifest: 12-ai-llm-training-architecture-and-research/fine-tuning/SKILL.md

source content

Purpose

Guide the selection and execution of fine-tuning strategies for LLMs — from full fine-tuning to parameter-efficient methods (LoRA, QLoRA, prefix tuning) — using HuggingFace PEFT, TRL, and Transformers. Includes adapter configuration, quantization setup, data formatting, and evaluation.

When to use this skill

Use this skill when:

adapting a pretrained LLM to a specific downstream task or domain
choosing between full fine-tuning, LoRA, QLoRA, or prefix tuning
configuring PEFT adapters, quantization, or training hyperparameters
preparing instruction/input/output datasets for supervised fine-tuning
evaluating whether fine-tuning improved performance over the base model

Do not use this skill when

the task is prompt engineering or in-context learning without weight updates
the goal is instruction-tuning on chat data (use
```
instruction-tuning
```
for SFT on conversations)
the task is RLHF or DPO alignment (use
```
preference-optimization
```
)
you have fewer than ~100 labeled examples (try few-shot prompting first)

Operating procedure

Decide tuning strategy. Full fine-tuning: updates all parameters, requires multi-GPU, best for large high-quality datasets (>50k examples). LoRA: trains low-rank adapters on attention projections, 10–100× fewer trainable params. QLoRA: loads base model in 4-bit, trains LoRA adapters in fp16, fits 65B models on a single 48GB GPU.

Configure LoRA adapter. Use PEFT:

from peft import LoraConfig, get_peft_model
lora_config = LoraConfig(
    r=16, lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05, bias="none", task_type="CAUSAL_LM"
)
model = get_peft_model(base_model, lora_config)

For QLoRA, first quantize:

BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16)

Prepare data. Format as
```
{"instruction": "...", "input": "...", "output": "..."}
```
or chat-template JSON. Ensure a held-out validation split (10–20%). Clean duplicates and filter low-quality examples.

Configure training. Use TRL's SFTTrainer:

from trl import SFTTrainer, SFTConfig
training_args = SFTConfig(
    output_dir="./output", num_train_epochs=3,
    per_device_train_batch_size=4, gradient_accumulation_steps=4,
    learning_rate=2e-4, lr_scheduler_type="cosine", warmup_ratio=0.03,
    bf16=True, logging_steps=10, save_strategy="epoch",
    max_seq_length=2048
)
trainer = SFTTrainer(model=model, args=training_args, train_dataset=dataset)

Evaluate. Track validation loss each epoch. Run task-specific metrics (accuracy, F1, ROUGE, pass@k). Compare against base model zero-shot and few-shot baselines. Check for catastrophic forgetting on a held-out general benchmark.
Merge and export. For LoRA:
```
model.merge_and_unload()
```
to produce a standalone model, or ship adapter weights separately for modularity.

Decision rules

If prompt engineering achieves ≥90% of target quality, do not fine-tune — save compute.
Use LoRA (r=8–64) as the default starting point; only go to full fine-tuning if LoRA plateaus.
Use QLoRA when GPU VRAM is limited (≤24GB) and model is ≥13B parameters.
Set learning rate to 2e-4 for LoRA, 1e-5 for full fine-tuning. Use cosine schedule with 3% warmup.
Minimum dataset: ~500 examples for LoRA on focused tasks; ~5k for robust generalization.
If validation loss increases for 2+ consecutive evaluations, stop training (early stopping).

Output requirements

```
Adapter config
```
— LoRA rank, alpha, target modules, dropout, quantization config
```
Training config
```
— learning rate, batch size, epochs, scheduler, gradient accumulation
```
Data spec
```
— format, size, split ratios, preprocessing steps
```
Evaluation report
```
— base model vs. fine-tuned on held-out set, with task-specific metrics
```
Exported artifacts
```
— adapter weights or merged model, tokenizer, training logs

References

LoRA: Hu et al., "Low-Rank Adaptation of Large Language Models" (arXiv:2106.09685)
QLoRA: Dettmers et al., "QLoRA: Efficient Finetuning of Quantized LLMs" (arXiv:2305.14314)
PEFT library: https://github.com/huggingface/peft
TRL library: https://github.com/huggingface/trl
BitsAndBytes: https://github.com/TimDettmers/bitsandbytes

Related skills

```
instruction-tuning
```
```
preference-optimization
```
```
eval-dataset-design
```
```
llm-creation
```

Failure handling

If training loss does not decrease after 1 epoch, verify data formatting and check that labels are not masked entirely.
If OOM occurs, reduce batch size, enable gradient checkpointing (
```
model.gradient_checkpointing_enable()
```
), or switch to QLoRA.
If fine-tuned model regresses on general tasks, reduce epochs, lower learning rate, or add general-domain data to the mix.