Claude-skill-registry funsloth-runpod

Training manager for RunPod GPU instances - configure pods, launch training, monitor progress, retrieve checkpoints

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/funsloth-runpod" ~/.claude/skills/majiayu000-claude-skill-registry-funsloth-runpod && rm -rf "$T"
manifest: skills/data/funsloth-runpod/SKILL.md
source content

RunPod Training Manager

Run Unsloth training on RunPod GPU instances.

Prerequisites

  1. RunPod API Key:
    echo $RUNPOD_API_KEY
    (get at runpod.io/console/user/settings)
  2. RunPod SDK:
    pip install runpod
  3. Training notebook/script: From
    funsloth-train

Workflow

1. Select GPU

GPUVRAMCostBest For
RTX 309024GB~$0.35/hrBudget 7-14B
RTX 409024GB~$0.55/hrFast 7-14B
A100 40GB40GB~$1.50/hr14-34B
A100 80GB80GB~$2.00/hr70B
H10080GB~$3.50/hrFastest

RunPod typically has better prices than HF Jobs.

2. Choose Deployment

  • Pod (Recommended): Persistent, SSH access, network storage
  • Serverless: Pay per second, complex setup (better for inference)

3. Configure Network Volume (Recommended)

import runpod
volume = runpod.create_network_volume(name="funsloth-training", size_gb=50, region="US")

Allows: resume training, download checkpoints, share between pods.

4. Launch Pod

Use the official Unsloth Docker image for a pre-configured environment:

import runpod

pod = runpod.create_pod(
    name="funsloth-training",
    image_name="unsloth/unsloth",  # Official image, supports all GPUs incl. Blackwell
    gpu_type_id="{gpu_type}",
    volume_in_gb=50,
    network_volume_id="{volume_id}",
    env={
        "HF_TOKEN": "{token}",
        "WANDB_API_KEY": "{key}",
        "JUPYTER_PASSWORD": "unsloth",
    },
    ports="8888/http,22/tcp",
)

The Unsloth image includes Jupyter Lab (port 8888) and example notebooks in

/workspace/unsloth-notebooks/
.

5. Upload and Run

# SSH into pod
ssh root@{pod_ip}

# Upload script
scp train.py root@{pod_ip}:/workspace/

# Run training (use tmux for persistence)
tmux new -s training
cd /workspace && python train.py
# Ctrl+B, D to detach

6. Monitor

# SSH monitoring
tail -f /workspace/training.log
nvidia-smi -l 1

# Dashboard
https://runpod.io/console/pods/{pod_id}

7. Retrieve Checkpoints

# Save to network volume
cp -r /workspace/outputs /runpod-volume/

# Download via SCP
scp -r root@{pod_ip}:/workspace/outputs ./

# Or push to HF Hub from pod

8. Stop Pod

runpod.stop_pod(pod_id)    # Can resume later
runpod.terminate_pod(pod_id)  # Deletes pod, keeps volume

9. Handoff

Offer

funsloth-upload
for Hub upload with model card.

Best Practices

  1. Always use network volumes - pod storage is ephemeral
  2. Use spot instances for lower costs (risk of preemption)
  3. Set up SSH keys before creating pods
  4. Stop pods when not training - charges per minute
  5. Save checkpoints frequently with
    save_steps

Error Handling

ErrorResolution
Pod creation failedTry different GPU type or region
SSH refusedWait 1-2 min, check IP
Out of diskIncrease volume or clean up
Volume not mountingCheck same region as pod

Bundled Resources