Skills rocm_vllm_deployment

Name: rocm_vllm_deployment
Author: openclaw

Production-ready vLLM deployment on AMD ROCm GPUs. Combines environment auto-check, model parameter detection, Docker Compose deployment, health verification, and functional testing with comprehensive logging and security best practices.

install

source · Clone the upstream repo

git clone https://github.com/openclaw/skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/alexhegit/rocm-vllm-deployment" ~/.claude/skills/openclaw-skills-rocm-vllm-deployment && rm -rf "$T"

OpenClaw · Install into ~/.openclaw/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/alexhegit/rocm-vllm-deployment" ~/.openclaw/skills/openclaw-skills-rocm-vllm-deployment && rm -rf "$T"

manifest: skills/alexhegit/rocm-vllm-deployment/SKILL.md

ROCm vLLM Deployment Skill

Production-ready automation for deploying vLLM inference services on AMD ROCm GPUs using Docker Compose.

Features

Environment Auto-Check - Detects and repairs missing dependencies
Model Parameter Detection - Auto-reads config.json for optimal settings
VRAM Estimation - Calculates memory requirements before deployment
Secure Token Handling - Never writes tokens to compose files
Structured Output - All logs and test results saved per-model
Deployment Reports - Human-readable summary for each deployment
Health Verification - Automated health checks and functional tests
Troubleshooting Guide - Common issues and solutions

Environment Prerequisites

Recommended (for production): Add to

~/.bash_profile

# HuggingFace authentication token (required for gated models)
export HF_TOKEN="hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

# Model cache directory (optional)
export HF_HOME="$HOME/models"

# Apply changes
source ~/.bash_profile

Not required for testing: The skill will proceed without these set:

HF_TOKEN: Optional — public models work without it; gated models fail at download with clear error
HF_HOME: Optional — defaults to
```
/root/.cache/huggingface/hub
```

Environment Variable Detection

Priority Order:

Explicit parameter (highest) — Provided in task/request (e.g.,
```
hf_token: "xxx"
```
)
Environment variable — Already set in shell or from parent process
~/.bash_profile — Source to load variables
Default value (lowest) — HF_HOME defaults to
```
/root/.cache/huggingface/hub
```

Variable	Required	If Missing
`HF_TOKEN`	Conditional	Continue without token (public models work; gated models fail at download with clear error)
`HF_HOME`	No	Warning + Default — Use `/root/.cache/huggingface/hub`

Philosophy: Fail fast for configuration errors, fail at download time for authentication errors.

Helper Scripts

Location:

<skill-dir>/scripts/

check-env.sh

Validate and load environment variables before deployment.

Usage:

# Basic check (HF_TOKEN optional, HF_HOME optional with default)
./scripts/check-env.sh

# Strict mode (HF_HOME required, fails if not set)
./scripts/check-env.sh --strict

# Quiet mode (minimal output, for automation)
./scripts/check-env.sh --quiet

# Test with environment variables
HF_TOKEN="hf_xxx" HF_HOME="/models" ./scripts/check-env.sh

Exit Codes:

Code	Meaning
0	Environment check completed (variables loaded or defaulted)
2	Critical error (e.g., cannot source ~/.bash_profile)

Note: This script is optional. You can also directly run

source ~/.bash_profile

generate-report.sh

Generate human-readable deployment report after successful deployment.

Usage:

./scripts/generate-report.sh <model-id> <container-name> <port> <status> [model-load-time] [memory-used]

# Example:
./scripts/generate-report.sh \
  "Qwen-Qwen3-0.6B" \
  "vllm-qwen3-0-6b" \
  "8001" \
  "✅ Success" \
  "3.6" \
  "1.2"

Parameters:

Parameter	Required	Description
`model-id`	Yes	Model ID (with `/` replaced by `-` )
`container-name`	Yes	Docker container name
`port`	Yes	Host port for API endpoint
`status`	Yes	Deployment status (e.g., "✅ Success")
`model-load-time`	No	Model loading time in seconds
`memory-used`	No	Memory consumption in GiB

Output:

$HOME/vllm-compose/<model-id>/DEPLOYMENT_REPORT.md

Exit Codes:

Code	Meaning
0	Report generated successfully
1	Missing required parameters
2	Output directory not found

Integration: This script is automatically called in Phase 7 of the deployment workflow.

Input Schema

Parameter	Type	Required	Default	Description
model_id	String	Yes	-	HuggingFace model ID
docker_image	String	No	rocm/vllm-dev:nightly	vLLM Docker image
tensor_parallel_size	Integer	No	1	Number of GPUs
port	Integer	No	9999	API server port
hf_home	String	No	`${HF_HOME}` or `/root/.cache/huggingface/hub`	Model cache directory
hf_token	Secret	Conditional	`${HF_TOKEN}`	HuggingFace token (optional for public models, required for gated models)
max_model_len	Integer	No	Auto-detect	Maximum sequence length
gpu_memory_utilization	Float	No	0.85	GPU memory utilization
auto_install	Boolean	No	true	Auto-install dependencies
log_level	String	No	INFO	Logging verbosity

Output Structure

All deployment artifacts MUST be saved to:

$HOME/vllm-compose/<model-id-slash-to-dash>/

Convert model ID to directory name by replacing

with

openai/gpt-oss-20b

→

$HOME/vllm-compose/openai-gpt-oss-20b/

Qwen/Qwen3-Coder-Next-FP8

→

$HOME/vllm-compose/Qwen-Qwen3-Coder-Next-FP8/

Per-model directory structure:

$HOME/vllm-compose/<model-id>/
├── deployment.log          # Full deployment logs (stdout + stderr)
├── test-results.json       # Functional test results (JSON format)
├── docker-compose.yml      # Generated Docker Compose file
├── .env                    # HF_TOKEN environment (chmod 600, optional)
└── DEPLOYMENT_REPORT.md    # Human-readable deployment summary

File requirements:

```
deployment.log
```
— Capture ALL container logs during deployment
```
test-results.json
```
— Save API response from functional test request
```
DEPLOYMENT_REPORT.md
```
— Generated in Phase 7
All three files MUST exist before marking deployment as complete

Execution Workflow

Phase 0: Environment Check & Auto-Repair

Step 0.1: Load Environment Variables

# Source ~/.bash_profile to load HF_HOME and HF_TOKEN
source ~/.bash_profile

# If HF_HOME is not defined, it defaults to /root/.cache/huggingface/hub

If HF_HOME is not defined in ~/.bash_profile, it defaults to

/root/.cache/huggingface/hub

Step 0.2: Create Output Directory

Create:
```
$HOME/vllm-compose/<model-id>/
```

Step 0.3: Initialize Logging

All output →

$HOME/vllm-compose/<model-id>/deployment.log

Step 0.4: System Checks

Detect OS and package manager
Check Python, pip, huggingface_hub
Check Docker, docker compose
Check ROCm tools (rocm-smi/amd-smi)
Check GPU access (/dev/kfd, /dev/dri)
Check disk space (20GB minimum)

Phase 1: Model Download

Use HF_HOME from Phase 0 (environment variable or default):

# Download model to HF_HOME
huggingface-cli download <model_id> --local-dir "$HF_HOME/hub/models--<org>--<model>"

# Or use snapshot_download via Python:
python -c "from huggingface_hub import snapshot_download; snapshot_download(repo_id='<model_id>', cache_dir='$HF_HOME')"

Authentication Handling:

Scenario	Behavior
Public model + no token	✅ Download succeeds
Public model + token provided	✅ Download succeeds
Gated model + no token	❌ Download fails with "authentication required" error
Gated model + invalid token	❌ Download fails with "invalid token" error
Gated model + valid token	✅ Download succeeds

On Authentication Failure:

echo "ERROR: Model download failed - authentication required"
echo "This model requires a valid HF_TOKEN."
echo ""
echo "Please add to ~/.bash_profile:"
echo "  export HF_TOKEN=\"hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\""
echo "Then run: source ~/.bash_profile"
exit 1

Locate model path in HF cache:

$HF_HOME/hub/models--<org>--<model-name>/

Log download progress to
```
deployment.log
```

Phase 2: Model Parameter Detection

Read config.json from model
Auto-detect: max_model_len, hidden_size, num_attention_heads, num_hidden_layers, vocab_size, dtype
Validate TP size divides attention heads
Estimate VRAM requirement

Phase 3: Docker Compose Configuration

Generate files in output directory:

docker-compose.yml →
```
$HOME/vllm-compose/<model-id>/docker-compose.yml
```
- Mount HF_HOME as volume (read-only for models)
- NO hardcoded tokens in compose file
.env →
```
$HOME/vllm-compose/<model-id>/.env
```
(optional)
- Contains:
```
HF_TOKEN=<value>
```
- Permissions:
```
chmod 600
```
- Only created if user explicitly requests persistent token storage

Volume mount example:

volumes:
  - ${HF_HOME}:/root/.cache/huggingface/hub:ro
  - /dev/kfd:/dev/kfd
  - /dev/dri:/dev/dri

Important: Docker Compose reads

${HF_HOME}

from the host environment at runtime. Before running docker compose, source ~/.bash_profile:

source ~/.bash_profile

Phase 4: Container Launch

Important: Before deploying, pull the latest image to ensure updates:

docker pull rocm/vllm-dev:nightly

Note: Default port is 9999. Before running docker compose, check if port is available:

ss -tlnp | grep :<port>

. If port is in use, specify a different port in docker-compose.yml.

Pass HF_TOKEN at runtime: HF_TOKEN=$HF_TOKEN docker compose up -d
Wait for container initialization

Phase 5: Health Verification

Check container status
Test /health endpoint
Test /v1/models endpoint

Phase 6: Functional Testing

Run completion test via
```
/v1/chat/completions
```
API

Save response to:

$HOME/vllm-compose/<model-id>/test-results.json

Verify response contains valid completion
Log deployment complete → Append to
```
deployment.log
```
Deployment is complete only when both files exist:
- ```
deployment.log
```
- ```
test-results.json
```

Phase 7: Deployment Report

Generate human-readable deployment report using the helper script.

Step 7.1: Extract Deployment Metrics

# Parse deployment.log for metrics
MODEL_LOAD_TIME=$(grep -o "model loading took [0-9.]* seconds" deployment.log | grep -o '[0-9.]*' || echo "N/A")
MEMORY_USED=$(grep -o "took [0-9.]* GiB memory" deployment.log | grep -o '[0-9.]*' || echo "N/A")

Step 7.2: Generate Report

# Execute the report generation script
<skill-dir>/scripts/generate-report.sh \
  "<model-id>" \
  "<container-name>" \
  "<port>" \
  "<status>" \
  "$MODEL_LOAD_TIME" \
  "$MEMORY_USED"

# Example:
./scripts/generate-report.sh \
  "Qwen-Qwen3-0.6B" \
  "vllm-qwen3-0-6b" \
  "8001" \
  "✅ Success" \
  "3.6" \
  "1.2"

Output:

$HOME/vllm-compose/<model-id>/DEPLOYMENT_REPORT.md

Report Contents:

Output structure verification (file checklist)
Deployment summary table (health, test, metrics)
Test results (request/response preview)
Environment configuration
Quick commands for operations

Completion Criteria:

```
DEPLOYMENT_REPORT.md
```
exists in output directory
Report contains all required sections
All file checks show ✅

Security Best Practices

Never commit tokens to version control — Add
```
.env
```
to
```
.gitignore
```
Use .env files with chmod 600 — Restrict access to owner only
Mask tokens in logs — Show only first 10 chars:
```
${TOKEN:0:10}...
```
Pass tokens at runtime —
```
HF_TOKEN=$HF_TOKEN docker compose up -d
```
Store tokens in ~/.bash_profile — For production environments, set
```
HF_TOKEN
```
in user's shell config
Set token for gated models — HF_TOKEN is validated at download time; set in ~/.bash_profile for production

Troubleshooting

Environment Variables

Issue	Solution
`HF_TOKEN not set`	Add `export HF_TOKEN="hf_xxx"` to `~/.bash_profile` , then `source ~/.bash_profile` . Or provide via parameter.
`HF_HOME not set`	defaults to `/root/.cache/huggingface/hub` . For production, add `export HF_HOME="/path"` to `~/.bash_profile` .
`~/.bash_profile not found`	Create `~/.bash_profile` and add environment variables.
`Changes not taking effect`	Run `source ~/.bash_profile` or restart terminal.
`HF_TOKEN provided but download still fails`	Token may be invalid or lack access to the model. Verify token at https://huggingface.co/settings/tokens

Model Download

Issue Solution

Authentication required

(gated model)

Set

HF_TOKEN

~/.bash_profile

or provide via parameter. Ensure token has access to the model.

Model not found

Verify model ID is correct (case-sensitive). Check model exists on HuggingFace.

Download timeout

Check network connection. Large models may take time.

Deployment

Issue	Solution
hf CLI not found	`pip install huggingface_hub`
Docker Compose fails	Use `docker compose` (no hyphen)
GPU access fails	Add user to `render` group: `sudo usermod -aG render $USER`
Port in use	Change `port` parameter
OOM	Reduce `gpu_memory_utilization`

Cleanup

cd $HOME/vllm-compose/<model-id>
docker compose down

Status Check

Check deployment status and logs:

# View deployment directory
ls -la $HOME/vllm-compose/<model-id>/

# View live logs
tail -f $HOME/vllm-compose/<model-id>/deployment.log

# View test results
cat $HOME/vllm-compose/<model-id>/test-results.json

# Check container status
docker ps | grep <model-id>

# Verify environment variables
echo "HF_TOKEN: ${HF_TOKEN:0:10}..."
echo "HF_HOME: $HF_HOME"

Quick Start (Production)

Step 1: Add environment variables to ~/.bash_profile

# Required: HuggingFace token
export HF_TOKEN="hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

# Recommended: Custom model storage path (production)
export HF_HOME="/data/models/huggingface"

# Apply changes
source ~/.bash_profile

Step 2: Verify environment is ready

# Source ~/.bash_profile to load variables
source ~/.bash_profile

# Expected output:
# === Environment Ready ===
# Summary:
#   HF_TOKEN: hf_xxxxxx...
#   HF_HOME:  /data/models/huggingface

Step 3: Run deployment

# The skill will automatically:
# 1. Source ~/.bash_profile to load HF_HOME and HF_TOKEN
# 2. Use HF_TOKEN and HF_HOME from environment (or ~/.bash_profile, or defaults)
# 3. Proceed without token for public models
# 4. Fail at download time with clear error if gated model requires token

Version History

Version	Changes
1.0.0	Initial release