Babysitter pytorch-trainer

PyTorch model training skill with custom training loops, gradient management, and GPU optimization.

install

source · Clone the upstream repo

git clone https://github.com/a5c-ai/babysitter

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/a5c-ai/babysitter "$T" && mkdir -p ~/.claude/skills && cp -r "$T/library/specializations/data-science-ml/skills/pytorch-trainer" ~/.claude/skills/a5c-ai-babysitter-pytorch-trainer && rm -rf "$T"

manifest: library/specializations/data-science-ml/skills/pytorch-trainer/SKILL.md

pytorch-trainer

Overview

PyTorch model training skill with custom training loops, gradient management, GPU optimization, and integration with experiment tracking systems.

Capabilities

Custom training loop execution
Learning rate scheduling (StepLR, CosineAnnealing, OneCycleLR, etc.)
Gradient clipping and accumulation
Mixed precision training (AMP)
Checkpoint management and resumption
DataLoader optimization
Multi-GPU training (DataParallel, DistributedDataParallel)
Early stopping with patience

Target Processes

Model Training Pipeline with Experiment Tracking
Distributed Training Orchestration
AutoML Pipeline Orchestration

Tools and Libraries

PyTorch
PyTorch Lightning (optional)
torchvision, torchaudio, torchtext
CUDA toolkit

Input Schema

{
  "type": "object",
  "required": ["modelPath", "dataConfig", "trainingConfig"],
  "properties": {
    "modelPath": {
      "type": "string",
      "description": "Path to model definition file"
    },
    "dataConfig": {
      "type": "object",
      "properties": {
        "trainPath": { "type": "string" },
        "valPath": { "type": "string" },
        "batchSize": { "type": "integer" },
        "numWorkers": { "type": "integer" }
      }
    },
    "trainingConfig": {
      "type": "object",
      "properties": {
        "epochs": { "type": "integer" },
        "learningRate": { "type": "number" },
        "optimizer": { "type": "string" },
        "scheduler": { "type": "string" },
        "mixedPrecision": { "type": "boolean" },
        "gradientClipping": { "type": "number" },
        "gradientAccumulation": { "type": "integer" }
      }
    },
    "checkpointConfig": {
      "type": "object",
      "properties": {
        "saveDir": { "type": "string" },
        "saveEvery": { "type": "integer" },
        "resumeFrom": { "type": "string" }
      }
    }
  }
}

Output Schema

{
  "type": "object",
  "required": ["status", "metrics", "checkpointPath"],
  "properties": {
    "status": {
      "type": "string",
      "enum": ["success", "error", "early_stopped"]
    },
    "metrics": {
      "type": "object",
      "properties": {
        "trainLoss": { "type": "number" },
        "valLoss": { "type": "number" },
        "trainAccuracy": { "type": "number" },
        "valAccuracy": { "type": "number" },
        "epochsTrained": { "type": "integer" },
        "trainingTime": { "type": "number" }
      }
    },
    "checkpointPath": {
      "type": "string"
    },
    "learningCurve": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "epoch": { "type": "integer" },
          "trainLoss": { "type": "number" },
          "valLoss": { "type": "number" }
        }
      }
    }
  }
}

Usage Example

{
  kind: 'skill',
  title: 'Train PyTorch model',
  skill: {
    name: 'pytorch-trainer',
    context: {
      modelPath: 'models/resnet.py',
      dataConfig: {
        trainPath: 'data/train',
        valPath: 'data/val',
        batchSize: 32,
        numWorkers: 4
      },
      trainingConfig: {
        epochs: 100,
        learningRate: 0.001,
        optimizer: 'AdamW',
        scheduler: 'cosine',
        mixedPrecision: true,
        gradientClipping: 1.0
      }
    }
  }
}