Claude-skill-registry flox-cuda
CUDA and GPU development with Flox. Use for NVIDIA CUDA setup, GPU computing, deep learning frameworks, cuDNN, and cross-platform GPU/CPU development.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/flox-cuda" ~/.claude/skills/majiayu000-claude-skill-registry-flox-cuda && rm -rf "$T"
skills/data/flox-cuda/SKILL.mdFlox CUDA Development Guide
Prerequisites & Authentication
- Sign up for early access at https://flox.dev
- Authenticate with
flox auth login - Linux-only: CUDA packages only work on
["aarch64-linux", "x86_64-linux"] - All CUDA packages are prefixed with
in the catalogflox-cuda/ - No macOS support: Use Metal alternatives on Darwin
Core Commands
# Search for CUDA packages flox search cudatoolkit --all | grep flox-cuda flox search nvcc --all | grep 12_8 # Show available versions flox show flox-cuda/cudaPackages.cudatoolkit # Install CUDA packages flox install flox-cuda/cudaPackages_12_8.cuda_nvcc flox install flox-cuda/cudaPackages.cuda_cudart # Verify installation nvcc --version nvidia-smi
Package Discovery
# Search for CUDA toolkit flox search cudatoolkit --all | grep flox-cuda # Search for specific versions flox search nvcc --all | grep 12_8 # Show all available versions flox show flox-cuda/cudaPackages.cudatoolkit # Search for CUDA libraries flox search libcublas --all | grep flox-cuda flox search cudnn --all | grep flox-cuda
Essential CUDA Packages
| Package Pattern | Purpose | Example |
|---|---|---|
| Main CUDA Toolkit | |
| NVIDIA C++ Compiler | |
| CUDA Runtime API | |
| Linear algebra | |
| Fast Fourier Transform | |
| Random number generation | |
| Deep neural networks | |
| Multi-GPU communication | |
Critical: Conflict Resolution
CUDA packages have LICENSE file conflicts requiring explicit priorities:
[install] cuda_nvcc.pkg-path = "flox-cuda/cudaPackages_12_8.cuda_nvcc" cuda_nvcc.systems = ["aarch64-linux", "x86_64-linux"] cuda_nvcc.priority = 1 # Highest priority cuda_cudart.pkg-path = "flox-cuda/cudaPackages.cuda_cudart" cuda_cudart.systems = ["aarch64-linux", "x86_64-linux"] cuda_cudart.priority = 2 cudatoolkit.pkg-path = "flox-cuda/cudaPackages_12_8.cudatoolkit" cudatoolkit.systems = ["aarch64-linux", "x86_64-linux"] cudatoolkit.priority = 3 # Lower for LICENSE conflicts gcc.pkg-path = "gcc" gcc-unwrapped.pkg-path = "gcc-unwrapped" # For libstdc++ gcc-unwrapped.priority = 5
CUDA Version Selection
CUDA 12.x (Current)
[install] cuda_nvcc.pkg-path = "flox-cuda/cudaPackages_12_8.cuda_nvcc" cuda_nvcc.priority = 1 cuda_nvcc.systems = ["aarch64-linux", "x86_64-linux"] cudatoolkit.pkg-path = "flox-cuda/cudaPackages_12_8.cudatoolkit" cudatoolkit.priority = 3 cudatoolkit.systems = ["aarch64-linux", "x86_64-linux"]
CUDA 11.x (Legacy Support)
[install] cuda_nvcc.pkg-path = "flox-cuda/cudaPackages_11_8.cuda_nvcc" cuda_nvcc.priority = 1 cuda_nvcc.systems = ["aarch64-linux", "x86_64-linux"] cudatoolkit.pkg-path = "flox-cuda/cudaPackages_11_8.cudatoolkit" cudatoolkit.priority = 3 cudatoolkit.systems = ["aarch64-linux", "x86_64-linux"]
Cross-Platform GPU Development
Dual CUDA/CPU packages for portability (Linux gets CUDA, macOS gets CPU fallback):
[install] ## CUDA packages (Linux only) cuda-pytorch.pkg-path = "flox-cuda/python3Packages.torch" cuda-pytorch.systems = ["x86_64-linux", "aarch64-linux"] cuda-pytorch.priority = 1 ## Non-CUDA packages (macOS + Linux fallback) pytorch.pkg-path = "python313Packages.pytorch" pytorch.systems = ["x86_64-darwin", "aarch64-darwin"] pytorch.priority = 6 # Lower priority
GPU Detection Pattern
Dynamic CPU/GPU package installation in hooks:
setup_gpu_packages() { venv="$FLOX_ENV_CACHE/venv" if [ ! -f "$FLOX_ENV_CACHE/.deps_installed" ]; then if lspci 2>/dev/null | grep -E 'NVIDIA|AMD' > /dev/null; then echo "GPU detected, installing CUDA packages" uv pip install --python "$venv/bin/python" \ torch torchvision --index-url https://download.pytorch.org/whl/cu129 else echo "No GPU detected, installing CPU packages" uv pip install --python "$venv/bin/python" \ torch torchvision --index-url https://download.pytorch.org/whl/cpu fi touch "$FLOX_ENV_CACHE/.deps_installed" fi }
Complete CUDA Environment Examples
Basic CUDA Development
[install] cuda_nvcc.pkg-path = "flox-cuda/cudaPackages_12_8.cuda_nvcc" cuda_nvcc.priority = 1 cuda_nvcc.systems = ["aarch64-linux", "x86_64-linux"] cuda_cudart.pkg-path = "flox-cuda/cudaPackages.cuda_cudart" cuda_cudart.priority = 2 cuda_cudart.systems = ["aarch64-linux", "x86_64-linux"] gcc.pkg-path = "gcc" gcc-unwrapped.pkg-path = "gcc-unwrapped" gcc-unwrapped.priority = 5 [vars] CUDA_VERSION = "12.8" CUDA_HOME = "$FLOX_ENV" [hook] echo "CUDA $CUDA_VERSION environment ready" echo "nvcc: $(nvcc --version | grep release)"
Deep Learning with PyTorch
[install] cuda_nvcc.pkg-path = "flox-cuda/cudaPackages_12_8.cuda_nvcc" cuda_nvcc.priority = 1 cuda_nvcc.systems = ["aarch64-linux", "x86_64-linux"] cuda_cudart.pkg-path = "flox-cuda/cudaPackages.cuda_cudart" cuda_cudart.priority = 2 cuda_cudart.systems = ["aarch64-linux", "x86_64-linux"] libcublas.pkg-path = "flox-cuda/cudaPackages_12_8.libcublas" libcublas.priority = 2 libcublas.systems = ["aarch64-linux", "x86_64-linux"] cudnn.pkg-path = "flox-cuda/cudaPackages_12_8.cudnn_9_11" cudnn.priority = 2 cudnn.systems = ["aarch64-linux", "x86_64-linux"] python313Full.pkg-path = "python313Full" uv.pkg-path = "uv" gcc-unwrapped.pkg-path = "gcc-unwrapped" gcc-unwrapped.priority = 5 [vars] CUDA_VERSION = "12.8" PYTORCH_CUDA_ALLOC_CONF = "max_split_size_mb:128" [hook] setup_pytorch_cuda() { venv="$FLOX_ENV_CACHE/venv" if [ ! -d "$venv" ]; then uv venv "$venv" --python python3 fi if [ -f "$venv/bin/activate" ]; then source "$venv/bin/activate" fi if [ ! -f "$FLOX_ENV_CACHE/.deps_installed" ]; then uv pip install --python "$venv/bin/python" \ torch torchvision torchaudio \ --index-url https://download.pytorch.org/whl/cu129 touch "$FLOX_ENV_CACHE/.deps_installed" fi } setup_pytorch_cuda
TensorFlow with CUDA
[install] cuda_nvcc.pkg-path = "flox-cuda/cudaPackages_12_8.cuda_nvcc" cuda_nvcc.priority = 1 cuda_nvcc.systems = ["aarch64-linux", "x86_64-linux"] cuda_cudart.pkg-path = "flox-cuda/cudaPackages.cuda_cudart" cuda_cudart.priority = 2 cuda_cudart.systems = ["aarch64-linux", "x86_64-linux"] cudnn.pkg-path = "flox-cuda/cudaPackages_12_8.cudnn_9_11" cudnn.priority = 2 cudnn.systems = ["aarch64-linux", "x86_64-linux"] python313Full.pkg-path = "python313Full" uv.pkg-path = "uv" [hook] setup_tensorflow() { venv="$FLOX_ENV_CACHE/venv" [ ! -d "$venv" ] && uv venv "$venv" --python python3 [ -f "$venv/bin/activate" ] && source "$venv/bin/activate" if [ ! -f "$FLOX_ENV_CACHE/.tf_installed" ]; then uv pip install --python "$venv/bin/python" tensorflow[and-cuda] touch "$FLOX_ENV_CACHE/.tf_installed" fi } setup_tensorflow
Multi-GPU Development
[install] cuda_nvcc.pkg-path = "flox-cuda/cudaPackages_12_8.cuda_nvcc" cuda_nvcc.priority = 1 cuda_nvcc.systems = ["aarch64-linux", "x86_64-linux"] nccl.pkg-path = "flox-cuda/cudaPackages_12_8.nccl" nccl.priority = 2 nccl.systems = ["aarch64-linux", "x86_64-linux"] libcublas.pkg-path = "flox-cuda/cudaPackages_12_8.libcublas" libcublas.priority = 2 libcublas.systems = ["aarch64-linux", "x86_64-linux"] [vars] CUDA_VISIBLE_DEVICES = "0,1,2,3" # All GPUs NCCL_DEBUG = "INFO"
Modular CUDA Environments
Base CUDA Environment
# team/cuda-base [install] cuda_nvcc.pkg-path = "flox-cuda/cudaPackages_12_8.cuda_nvcc" cuda_nvcc.priority = 1 cuda_nvcc.systems = ["aarch64-linux", "x86_64-linux"] cuda_cudart.pkg-path = "flox-cuda/cudaPackages.cuda_cudart" cuda_cudart.priority = 2 cuda_cudart.systems = ["aarch64-linux", "x86_64-linux"] gcc.pkg-path = "gcc" gcc-unwrapped.pkg-path = "gcc-unwrapped" gcc-unwrapped.priority = 5 [vars] CUDA_VERSION = "12.8" CUDA_HOME = "$FLOX_ENV"
CUDA Math Libraries
# team/cuda-math [include] environments = [{ remote = "team/cuda-base" }] [install] libcublas.pkg-path = "flox-cuda/cudaPackages_12_8.libcublas" libcublas.priority = 2 libcublas.systems = ["aarch64-linux", "x86_64-linux"] libcufft.pkg-path = "flox-cuda/cudaPackages_12_8.libcufft" libcufft.priority = 2 libcufft.systems = ["aarch64-linux", "x86_64-linux"] libcurand.pkg-path = "flox-cuda/cudaPackages_12_8.libcurand" libcurand.priority = 2 libcurand.systems = ["aarch64-linux", "x86_64-linux"]
CUDA Debugging Tools
# team/cuda-debug [install] cuda-gdb.pkg-path = "flox-cuda/cudaPackages_12_8.cuda-gdb" cuda-gdb.systems = ["aarch64-linux", "x86_64-linux"] nsight-systems.pkg-path = "flox-cuda/cudaPackages_12_8.nsight-systems" nsight-systems.systems = ["aarch64-linux", "x86_64-linux"] [vars] CUDA_LAUNCH_BLOCKING = "1" # Synchronous kernel launches for debugging
Layer for Development
# Base CUDA environment flox activate -r team/cuda-base # Add debugging tools when needed flox activate -r team/cuda-base -- flox activate -r team/cuda-debug
Testing CUDA Installation
Verify CUDA Compiler
nvcc --version
Check GPU Availability
nvidia-smi
Compile Test Program
cat > hello_cuda.cu << 'EOF' #include <stdio.h> __global__ void hello() { printf("Hello from GPU!\n"); } int main() { hello<<<1,1>>>(); cudaDeviceSynchronize(); return 0; } EOF nvcc hello_cuda.cu -o hello_cuda ./hello_cuda
Test PyTorch CUDA
import torch print(f"CUDA available: {torch.cuda.is_available()}") print(f"CUDA version: {torch.version.cuda}") print(f"GPU count: {torch.cuda.device_count()}") if torch.cuda.is_available(): print(f"GPU name: {torch.cuda.get_device_name(0)}")
Best Practices
Always Use Priority Values
CUDA packages have predictable conflicts - assign explicit priorities
Version Consistency
Use specific versions (e.g.,
_12_8) for reproducibility. Don't mix CUDA versions.
Modular Design
Split base CUDA, math libs, and debugging into separate environments for flexibility
Test Compilation
Verify
nvcc hello.cu -o hello works after setup
Platform Constraints
Always include
systems = ["aarch64-linux", "x86_64-linux"]
Memory Management
Set appropriate CUDA memory allocator configs:
[vars] PYTORCH_CUDA_ALLOC_CONF = "max_split_size_mb:128" CUDA_LAUNCH_BLOCKING = "0" # Async by default
Common CUDA Gotchas
CUDA Toolkit ≠ Complete Toolkit
The cudatoolkit package doesn't include all libraries. Add what you need:
- libcublas for linear algebra
- libcufft for FFT
- cudnn for deep learning
License Conflicts
Every CUDA package may need explicit priority due to LICENSE file conflicts
No macOS Support
CUDA is Linux-only. Use Metal-accelerated packages on Darwin when available
Version Mixing
Don't mix CUDA versions. Use consistent
_X_Y suffixes across all CUDA packages
Python Virtual Environments
CUDA Python packages (PyTorch, TensorFlow) should be installed in venv with correct CUDA version
Driver Requirements
Ensure NVIDIA driver supports your CUDA version. Check with
nvidia-smi
Troubleshooting
CUDA Not Found
# Check CUDA_HOME echo $CUDA_HOME # Check nvcc which nvcc nvcc --version # Check library paths echo $LD_LIBRARY_PATH
PyTorch Not Using GPU
import torch print(torch.cuda.is_available()) # Should be True print(torch.version.cuda) # Should match your CUDA version # If False, reinstall with correct CUDA version # uv pip install torch --index-url https://download.pytorch.org/whl/cu129
Compilation Errors
# Check gcc/g++ version gcc --version g++ --version # Ensure gcc-unwrapped is installed flox list | grep gcc-unwrapped # Check include paths echo $CPATH echo $LIBRARY_PATH
Runtime Errors
# Check GPU visibility echo $CUDA_VISIBLE_DEVICES # Check for GPU nvidia-smi # Run with debug output CUDA_LAUNCH_BLOCKING=1 python my_script.py
Related Skills
- flox-environments - Setting up development environments
- flox-sharing - Composing CUDA base with project environments
- flox-containers - Containerizing CUDA environments for deployment
- flox-services - Running CUDA workloads as services