Claude-skill-registry dstack
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/dstack" ~/.claude/skills/majiayu000-claude-skill-registry-dstack && rm -rf "$T"
skills/data/dstack/SKILL.mddstack
Overview
dstack is a tool that allows to provision and orchestrate GPU workloads across GPU clouds and Kubernetes clusters or on-prem clusters (SSH fleets).
When to use this skill:
- Running/managing GPU workloads (dev environments, tasks for training or other batch jobs, services to run inference or deploy web apps)
- Creating, editing, and running
configurationsdstack - Managing fleets of compute (instances/clusters)
How it works
dstack operates through three core components:
server - Can run locally, remotely, or via dstack Sky (managed)dstack
CLI - For applying configurations and managing resources; CLI can be pointed to the server and a particular default project (dstack
or via~/.dstack/config.yml
CLI command); other CLI commands use the default projectdstack project
configuration files - YAML files ending withdstack.dstack.yml
Typical workflow:
# 1. Define configuration in YAML file (e.g., train.dstack.yml, .dstack.yml, llama-serve.dstack.yml) # 2. Apply configuration dstack apply -f train.dstack.yml # 3. dstack prepares a plan, and once confirmed, provisions instances (according to created fleets) and runs workloads # 4. Monitor with `dstack ps`, `dstack logs`, `dstack attach`, etc. (these commands support various options).
By default,
dstack apply requires a confirmation, and once first job within the run is running - it "attaches" establishes an SSH tunnel, forwards ports if any and streams logs in real-time; if you pass -d, it runs in the detached mode and exits once the run is submitted.
CRITICAL: Never propose
CLI commands or YAML syntaxes that don't exist.dstack
- Only use CLI commands and YAML syntax explicitly documented in this skill file or verified via
--help - If uncertain about a command or its syntax, check the links or use
--help
NEVER do the following:
- Invent CLI flags not documented here or shown in
--help - Guess YAML property names - verify in configuration reference links
- Run
for runs withoutdstack apply
in automated contexts (blocks indefinitely)-d - Retry failed commands without addressing the underlying error
- Summarize or reformat tabular CLI output - show it as-is
- Use
whenecho "y" |
flag is available-y - Assume a command succeeded without checking output for errors
Agent execution guidelines
This section provides critical guidance for AI agents executing dstack commands.
Output accuracy
- NEVER reformat, summarize, or paraphrase CLI output. Display tables, status output, and error messages exactly as returned.
- When showing command results, use code blocks to preserve formatting.
- If output is truncated due to length, indicate this clearly (e.g., "Output truncated. Full output shows X entries.").
Verification before execution
- When uncertain about any CLI flag or YAML property, run
first.dstack <command> --help - Never guess or invent flags. Example verification commands:
dstack --help # List all commands dstack apply --help <configuration tpye> # Flags for apply per configuration type (dev-environment, task, service, fleet, etc) dstack fleet --help # Fleet subcommands dstack ps --help # Flags for ps - If a command or flag isn't documented, it doesn't exist.
Command timing and confirmation handling
Commands that run indefinitely (agents should avoid these):
- maintains connection until interrupteddstack attach
withoutdstack apply
for runs - streams logs after provisioning-d
- watch mode, auto-refreshes until interrupteddstack ps -w
Instead, use
dstack ps -v to check status, or dstack apply -d for detached mode.
All other commands: Use 10-60s timeout. Most complete within this range. While waiting, monitor the output - it may contain errors, warnings, or prompts requiring attention.
Confirmation handling:
,dstack apply
,dstack stop
require confirmationdstack fleet delete- Use
flag to auto-confirm when user has already approved-y - Use
to previewecho "n" |
plan without executing (avoiddstack apply
, preferecho "y" |
)-y
Best practices:
- Prefer modifying configuration files over passing parameters to
(unless it's an exception)dstack apply - When user confirms deletion/stop operations, use
flag to skip confirmation prompts-y - Avoid waiting indefinitely; display essential output once command is finished (even if by timeout)
Configuration types
dstack supports five main configuration types, each with specific use cases. Configuration files can be named <name>.dstack.yml or simply .dstack.yml.
Common parameters: All run configurations (dev environments, tasks, services) support many parameters including:
- Git integration: Clone repos automatically (
), mount existing repos (repo
), upload local files (repos
)working_dir - Docker support: Use custom Docker images (
); Also if needed, useimage
if you want to usedocker: true
from inside the container (VM-based backends only)docker - Environment & secrets: Set environment variables (
), reference secretsenv - Storage: Persistent network volumes (
), specify disk sizevolumes - Resources: Define GPU, CPU, memory, and disk requirements
Best practices:
- Prefer giving configurations a
property for easier managementname
See configuration reference pages for complete parameter lists.
1. Dev environments
Use for: Interactive development with IDE integration (VS Code, Cursor, etc.).
type: dev-environment name: cursor python: "3.12" ide: vscode resources: gpu: 80GB disk: 500GB
Concept documentation | Configuration reference
2. Tasks
Use for: Batch jobs, training runs, fine-tuning, web applications, any executable workload.
Key features: Distributed training (multi-node), port forwarding for web apps.
type: task name: train python: "3.12" env: - HUGGING_FACE_HUB_TOKEN commands: - uv pip install -r requirements.txt - uv run python train.py ports: - 8501 # Optional: expose ports for web apps resources: gpu: A100:40GB:2 # Two 40GB A100s disk: 200GB
Port forwarding: When you specify
ports, dstack apply automatically forwards them to localhost while attached. Use dstack attach <run-name> to reconnect and restore port forwarding. The run name becomes an SSH alias (e.g., ssh <run-name>) for direct access.
Examples:
- Single-node training (TRL)
- Single-node training (Axolotl)
- Distributed training (TRL)
- Distributed training (Axolotl)
- Distributed training (Ray+RAGEN)
- NCCL/RCCL tests
Concept documentation | Configuration reference
3. Services
Use for: Deploying models or web applications as production endpoints.
Key features: OpenAI-compatible model serving, auto-scaling (RPS/queue), custom gateways with HTTPS.
type: service name: llama31 python: "3.12" env: - HF_TOKEN commands: - uv pip install vllm - uv run vllm serve meta-llama/Meta-Llama-3.1-8B-Instruct port: 8000 model: meta-llama/Meta-Llama-3.1-8B-Instruct resources: gpu: 80GB disk: 200GB
Once a service is
running and its health probes are green:
Service endpoints:
- Without gateway:
<dstack server URL>/proxy/services/<project name>/<run name>/ - With gateway:
https://<run name>.<gateway domain>/
Example:
curl http://localhost:3000/proxy/services/<project name>/<run name>/v1/chat/completions \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer <dstack token>' \ -d '{"model": "meta-llama/Meta-Llama-3.1-8B-Instruct", "messages": [{"role": "user", "content": "Hello!"}]}'
Gateways: Set up a gateway before running services to enable custom domains, HTTPS, auto-scaling rate limits, and production-grade endpoint management. Use the
dstack gateway CLI command to manage gateways.
Examples:
Concept documentation | Configuration reference
4. Fleets
Use for: Pre-provisioning infrastructure for workloads, managing on-premises GPU servers, creating auto-scaling instance pools.
Important: Workloads (dev environments, tasks, services) only run if their resource requirements match at least one configured fleet. Without matching fleets, provisioning will fail.
dstack supports two fleet types:
Backend fleets (Cloud/Kubernetes)
Dynamically provision instances from configured backends. Use the
nodes property for on-demand scaling:
type: fleet name: my-fleet nodes: 0..2 # Range: creates template when starting with 0, provisions on-demand resources: gpu: 24GB.. # 24GB or more disk: 200GB spot_policy: auto # auto (default), spot, or on-demand idle_duration: 5m # Terminate idle instances after 5 minutes
On-demand provisioning: When
nodes is a range (e.g., 0..2, 1..10), dstack creates an instance template. Instances are provisioned automatically when workloads need them, scaling between min and max. Set idle_duration to terminate idle instances.
Additional options: Fleets support many configuration options including
placement: cluster for multi-node distributed workloads requiring inter-node communication (e.g., multi-GPU training), blocks for resource isolation, environment variables, and more. See the configuration reference for complete details.
SSH fleets (on-prem or pre-provisioned clusters)
Use existing GPU servers accessible via SSH:
type: fleet name: on-prem-fleet ssh_config: user: ubuntu identity_file: ~/.ssh/id_rsa hosts: - 192.168.1.10 - 192.168.1.11
Concept documentation | Configuration reference
5. Volumes
Use for: Persistent storage for datasets, model checkpoints, training artifacts that persist across runs and can be shared between workloads.
dstack supports two types of volumes:
Network Volumes
Backend-specific persistent volumes (AWS EBS, GCP Persistent Disk, etc.) that can be attached to any dev environment, task, or service.
Define a network volume:
type: volume name: my-volume backend: aws region: us-east-1 resources: disk: 500GB
Attach to workloads via
property:volumes
type: task # ... other config volumes: - name: my-volume path: /volume_data
Instance Volumes
Faster local volumes using the instance's root disk. Ideal for ephemeral storage, caching, or maximum I/O performance without persistence across instances.
Attach instance volumes via
property:volumes
type: dev-environment # ... other config volumes: - name: my-instance-volume path: /cache_data
Note: Volumes can be attached to dev environments, tasks, and services using the
volumes property. Network volumes persist independently, while instance volumes are tied to the instance lifecycle.
Concept documentation | Configuration reference
Essential CLI commands
Apply configurations
Important behavior:
shows a plan with estimated costs and may ask for confirmation (respond withdstack apply
or usey
flag to skip)-y- Once confirmed, it provisions infrastructure and streams real-time output to the terminal
- In attached mode (default), the terminal blocks and shows output - use timeout or Ctrl+C to interrupt if you need to continue with other commands
- In detached mode (
), runs in background without blocking the terminal-d
Workflow for applying configurations:
Critical for agents: Always show the plan first, wait for user confirmation, THEN execute. Never auto-execute without user approval.
Step-by-step for run configurations (dev-environment, task, service):
-
Show plan:
echo "n" | dstack apply -f config.dstack.ymlDisplay the FULL output including the offers table and cost estimate. Do NOT summarize or reformat.
-
Wait for user confirmation. Do NOT proceed if:
- Output shows "No offers found" or similar errors
- Output shows validation errors
- User has not explicitly confirmed
-
Execute (only after user confirms):
dstack apply -f config.dstack.yml -y -d -
Verify apply status:
dstack ps -vShow the run status. Look for the run name and status column.
Step-by-step for infrastructure (fleet, volume, gateway):
-
Show plan:
echo "n" | dstack apply -f fleet.dstack.ymlDisplay the FULL output. Do NOT summarize or reformat.
-
Wait for user confirmation.
-
Execute:
dstack apply -f fleet.dstack.yml -y -
Verify: Use
,dstack fleet
, ordstack volume
respectively.dstack gateway
Common apply patterns:
# Apply and attach (interactive, blocks terminal with port forwarding) dstack apply -f train.dstack.yml # Apply with automatic confirmation dstack apply -f train.dstack.yml -y # Apply detached (background, no attachment) dstack apply -f serve.dstack.yml -d # Force rerun (recreates even if run with same name exists) dstack apply -f finetune.dstack.yml --force # Override defaults (prefer modifying config file instead, unless it's an exception) dstack apply -f .dstack.yml --max-price 2.5
Fleet Management
# Create/update fleet dstack apply -f fleet.dstack.yml # List fleets dstack fleet # Get fleet details dstack fleet get my-fleet # Get fleet details as JSON (for troubleshooting) dstack fleet get my-fleet --json # Delete entire fleet (use -y when user already confirmed) dstack fleet delete my-fleet -y # IMPORTANT: When asked to delete an instance, always use -i <instance num> - do NOT delete the entire fleet (use -y when user already confirmed) dstack fleet delete my-fleet -i <instance num> -y
Monitor runs
# List all runs dstack ps # JSON output (for troubleshooting/scripting) dstack ps --json # Verbose output with full details dstack ps -v # Get specific run details as JSON dstack run get my-run-name --json
Attach to runs
What is attaching? Attaching connects to an existing run to restore port forwarding (for tasks/services with ports) and enable SSH access. The run name becomes an SSH alias (e.g.,
ssh my-run-name) configured in ~/.dstack/ssh/config (included to ~/.ssh/config).
Note:
dstack apply automatically attaches when run completes provisioning. Use dstack attach to reconnect after detaching or to access detached runs.
# Attach and replay logs from start (preferred, unless asked otherwise) dstack attach my-run-name --logs # Attach without replaying logs (restores port forwarding + SSH only) dstack attach my-run-name
View logs
# Stream logs (tail mode) dstack logs my-run-name # Debug mode (includes additional runner logs) dstack logs my-run-name -d # Fetch logs from specific replica (multi-node runs) dstack logs my-run-name --replica 1 # Fetch logs from specific job dstack logs my-run-name --job 0
Stop runs
# Stop specific run dstack stop my-run-name # Stop with confirmation skipped (use when user already confirmed) dstack stop my-run-name -y # Abort (force stop) dstack stop my-run-name --abort
Check available resources
Use
to verify GPU availability before provisioning:dstack offer
# List all available offers across backends dstack offer --json # Filter by specific backend dstack offer --backend aws # Filter by GPU type dstack offer --gpu A100 # Filter by GPU memory dstack offer --gpu 24GB..80GB # JSON output for detailed inspection dstack offer --json # Combine filters dstack offer --backend aws --gpu A100:80GB
Note:
dstack offer shows all available GPU instances from configured backends, not just those matching configured fleets. Use it to check backend availability, but remember: an offer appearing here doesn't guarantee a fleet will provision it - fleets have their own resource constraints.
Expected Output Formats
Agents should display these tables as-is, preserving column alignment.
Troubleshooting
When diagnosing issues with dstack workloads or infrastructure:
-
Use JSON output for detailed inspection:
dstack fleet get my-fleet --json | jq . dstack run get my-run --json | jq . dstack ps -n 10 --json | jq . -
Check verbose run status:
dstack ps -v # Shows provisioning state, instance details, errors -
Examine logs with debug output:
dstack logs my-run -d # Includes additional runner logs -
Attach with log replay:
dstack attach my-run --logs # See full output from start -
Verify resource availability:
dstack offer --backend aws --gpu A100 --spot-auto --json # Check if resources exist
Common issues:
- No offers: Check
and ensure that at least one fleet matches requirementsdstack offer - No fleet: Ensure at least one fleet is created
- Configuration errors: Validate YAML syntax; check
output for specific errorsdstack apply - Provisioning timeouts: Use
to see provisioning status; consider spot vs on-demanddstack ps -v - Connection issues: Verify server status, check authentication, ensure network access to backends
When errors occur:
- Display the full error message unchanged
- Do NOT retry the same command without addressing the error
- Refer to the Troubleshooting guide for guidance
Additional Resources
Core documentation:
Additional concepts:
- Secrets - Manage sensitive credentials
- Projects - Projects isolate the resources of different teams
- Metrics - Track GPU utilization
- Events - Monitor system events
Guides:
- Server deployment (for server administration)
- Pro tips
Accelerator-specific examples:
Full documentation: https://dstack.ai/llms-full.txt