Skillsbench modal-gpu

Run Python code on cloud GPUs using Modal serverless platform. Use when you need A100/T4/A10G GPU access for training ML models. Covers Modal app setup, GPU selection, data downloading inside functions, and result handling.

install
source · Clone the upstream repo
git clone https://github.com/benchflow-ai/skillsbench
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/benchflow-ai/skillsbench "$T" && mkdir -p ~/.claude/skills && cp -r "$T/tasks/mhc-layer-impl/environment/skills/modal-gpu" ~/.claude/skills/benchflow-ai-skillsbench-modal-gpu && rm -rf "$T"
manifest: tasks/mhc-layer-impl/environment/skills/modal-gpu/SKILL.md
source content

Modal GPU Training

Overview

Modal is a serverless platform for running Python code on cloud GPUs. It provides:

  • Serverless GPUs: On-demand access to T4, A10G, A100 GPUs
  • Container Images: Define dependencies declaratively with pip
  • Remote Execution: Run functions on cloud infrastructure
  • Result Handling: Return Python objects from remote functions

Two patterns:

  • Single Function: Simple script with
    @app.function
    decorator
  • Multi-Function: Complex workflows with multiple remote calls

Quick Reference

TopicReference
Basic StructureGetting Started
GPU OptionsGPU Selection
Data HandlingData Download
Results & OutputsResults
TroubleshootingCommon Issues

Installation

pip install modal
modal token set --token-id <id> --token-secret <secret>

Minimal Example

import modal

app = modal.App("my-training-app")

image = modal.Image.debian_slim(python_version="3.11").pip_install(
    "torch",
    "einops",
    "numpy",
)

@app.function(gpu="A100", image=image, timeout=3600)
def train():
    import torch
    device = torch.device("cuda")
    print(f"Using GPU: {torch.cuda.get_device_name(0)}")

    # Training code here
    return {"loss": 0.5}

@app.local_entrypoint()
def main():
    results = train.remote()
    print(results)

Common Imports

import modal
from modal import Image, App

# Inside remote function
import torch
import torch.nn as nn
from huggingface_hub import hf_hub_download

When to Use What

ScenarioApproach
Quick GPU experiments
gpu="T4"
(16GB, cheapest)
Medium training jobs
gpu="A10G"
(24GB)
Large-scale training
gpu="A100"
(40/80GB, fastest)
Long-running jobsSet
timeout=3600
or higher
Data from HuggingFaceDownload inside function with
hf_hub_download
Return metricsReturn dict from function

Running

# Run script
modal run train_modal.py

# Run in background
modal run --detach train_modal.py

External Resources