Claude-skills gsplat-optimizer
Optimize 3D Gaussian Splat scenes for real-time rendering on iOS, macOS, and visionOS. Use when working with .ply or .splat files, targeting mobile/Apple GPU performance, or needing LOD, pruning, or compression strategies for 3DGS scenes.
git clone https://github.com/ckorhonen/claude-skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/ckorhonen/claude-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/gsplat-optimizer" ~/.claude/skills/ckorhonen-claude-skills-gsplat-optimizer && rm -rf "$T"
skills/gsplat-optimizer/SKILL.mdGaussian Splat Optimizer
Optimize 3D Gaussian Splatting scenes for real-time rendering on Apple platforms (iOS, macOS, visionOS) using Metal.
When to Use
- Optimizing
or.ply
files for mobile/Apple GPU targets.splat - Reducing gaussian count for performance (pruning strategies)
- Implementing Level-of-Detail (LOD) for large scenes
- Compressing splat data for bandwidth/storage constraints
- Profiling and optimizing Metal rendering performance
- Targeting specific FPS goals on Apple hardware
Quick Start
Input: Provide a
.ply/.splat file path, target device class, and FPS target.
# Analyze a splat file python ~/.claude/skills/gsplat-optimizer/scripts/analyze_splat.py scene.ply --device iphone --fps 60
Output: The skill provides:
- Point/gaussian pruning plan (opacity, size, error thresholds)
- LOD scheme suggestion (distance bins, gaussian subsets)
- Compression recommendation (if bandwidth/storage bound)
- Metal profiling checklist with shader/compute tips
Optimization Workflow
Step 1: Analyze the Scene
First, understand your scene characteristics:
- Gaussian count: Total number of splats
- Opacity distribution: Histogram of opacity values
- Size distribution: Gaussian scale statistics
- Memory footprint: Estimated GPU memory usage
Step 2: Determine Target Device
| Device Class | GPU Budget | Max Gaussians (60fps) | Storage Mode |
|---|---|---|---|
| iPhone (A15+) | 4-6GB unified | ~2-4M | Shared |
| iPad Pro (M1+) | 8-16GB unified | ~6-8M | Shared |
| Mac (M1-M3) | 8-24GB unified | ~8-12M | Shared/Managed |
| Vision Pro | 16GB unified | ~4-6M (stereo) | Shared |
| Mac (discrete GPU) | 8-24GB VRAM | ~10-15M | Private |
Step 3: Apply Pruning
If gaussian count exceeds device budget:
- Opacity threshold: Remove gaussians with opacity < 0.01-0.05
- Size culling: Remove sub-pixel gaussians (< 1px at target resolution)
- Importance pruning: Use LODGE algorithm for error-proxy selection
- Foveated rendering: For Vision Pro, reduce density in peripheral view
See references/pruning-strategies.md for details.
Step 4: Implement LOD (Large Scenes)
For scenes exceeding single-frame budget:
- Distance bins: Near (0-10m), Mid (10-50m), Far (50m+)
- Hierarchical structure: Octree or LoD tree for spatial queries
- Chunk streaming: Load/unload based on camera position
- Smooth transitions: Opacity blending at chunk boundaries
See references/lod-schemes.md for details.
Step 5: Apply Compression (If Needed)
For bandwidth/storage constraints:
| Method | Compression | Use Case |
|---|---|---|
| SOGS | 20x | Web delivery, moderate quality |
| SOG | 24x | Web delivery, better quality |
| CodecGS | 30x+ | Maximum compression |
| C3DGS | 31x | Fast rendering priority |
See references/compression.md for details.
Step 6: Profile and Optimize Metal
- Choose storage mode: Private for static data, Shared for dynamic
- Optimize shaders: Function constants, thread occupancy
- Profile with Xcode: GPU Frame Capture, Metal System Trace
- Iterate: Measure, optimize, repeat
See references/metal-profiling.md for details.
Common Pitfalls
1. Point Cloud Density Mismatch
Problem: Gaussian count doesn't match your scene complexity, causing either visual artifacts or wasted GPU resources.
- Too sparse (undersampling): Visible gaps, blockiness, loss of fine details
- Too dense (oversampling): Exceeds device budget, causes frame drops, GPU thrashing
Debugging:
# Analyze gaussian distribution python ~/.claude/skills/gsplat-optimizer/scripts/analyze_splat.py scene.ply --histogram # Check against device budget # Compare total_gaussians vs. device_max in the output table
Strategy:
- Start with device budget from Step 2 (e.g., 4M for iPhone)
- If scene exceeds budget by >20%, apply pruning before training
- If visual quality drops too much after pruning, consider LOD or chunking
- Use importance-weighted sampling (LODGE) to remove low-contribution gaussians, not just opaque ones
2. Training Instability (Gradient Explosions, Divergence)
Problem: During optimization (if fine-tuning on device), gaussian parameters diverge, causing:
- Loss suddenly jumps to NaN
- Gaussians disappear or explode in scale
- Model becomes unrecoverable mid-session
Debugging:
# Monitor loss during training tail -f training.log | grep -E "loss|nan|inf" # Check gradient magnitudes python -c " import numpy as np from plyfile import PlyData ply_data = PlyData.read('scene.ply') scales = ply_data['vertex']['scale_0'].data print(f'Scale range: {scales.min():.6f} to {scales.max():.6f}') print(f'Any NaN: {np.isnan(scales).any()}') "
Strategy:
- Gradient clipping: Cap gradient updates to ±0.1 scale per step
- Learning rate decay: Start at 1e-4, decay by 0.95 every epoch
- Loss regularization: Add L2 penalty on scale magnitudes to prevent explosions
- Checkpoint early: Save state every 10 iterations; rollback if loss spikes
- Freeze covariance: If converged, stop updating scale/rotation after 80% of training
- For device training: Reduce batch size or resolution if instability persists
3. Memory Limitations (OOM Errors on Large Scenes)
Problem: Scene exceeds available unified memory, causing allocation failures or GPU stalls.
- iPhone: 4–6GB shared between app + GPU
- iPad Pro: 8–16GB shared
- Vision Pro: 16GB (but stereo doubles gaussian count)
Debugging:
# Estimate memory footprint python << 'EOF' num_gaussians = 5_000_000 # Your count bytes_per_gaussian = 56 # pos (12) + scale (12) + rot quaternion (16) + opacity (4) + SH DC (12) total_mb = (num_gaussians * bytes_per_gaussian) / (1024 ** 2) print(f"Est. memory: {total_mb:.1f} MB") print(f"Safe for iPhone A15: {total_mb < 2000}") # Leave headroom for app EOF # Monitor live memory in Xcode # Memory graph + Allocations instrument during scene load
Strategy:
- Chunking for large scenes: Break into 1–4M gaussian chunks, stream based on camera distance
- Quantization: Store gaussians in FP16 instead of FP32 (2x memory reduction)
- Pruning first: Remove <0.01 opacity or sub-pixel gaussians before transfer to device
- Lazy loading: Keep only active LOD level in memory; unload far chunks
- Vision Pro consideration: Dual-eye rendering = 2x gaussian count; cap at 4M per eye
4. Quality/Speed Trade-Offs (Over-Optimization for One Metric)
Problem: Optimizing heavily for one metric breaks another:
- Maximize FPS → visual artifacts: Over-pruning removes important geometry
- Maximize quality → frame drops: Too many gaussians for target device
- Minimize memory → banding/posterization: Excessive quantization or LOD culling
Debugging:
# Profile before/after each change python << 'EOF' metrics = { "original": {"fps": 60, "gaussians": 5_000_000, "artifacts": "none"}, "after_pruning": {"fps": 58, "gaussians": 3_500_000, "artifacts": "block edges visible"}, } for label, m in metrics.items(): print(f"{label}: {m['fps']}fps, {m['gaussians']/1e6:.1f}M, {m['artifacts']}") EOF
Strategy:
- Define priority: Is this device speed-critical (AR, real-time) or quality-focused (preview)?
- Measure baseline: Profile original unoptimized scene first
- Iterate incrementally: Apply one optimization (pruning OR compression OR LOD), measure, decide
- Preserve quality metrics: Keep PSNR/SSIM scores; stop pruning if quality drops >1dB
- Target range: Aim for 50–60fps headroom (don't max out at exactly 60fps; device will throttle)
5. Real-Time Rendering Failures (Frame Drops, Shader Compilation)
Problem: Rendering pipeline stalls despite low gaussian count:
- First frame (cold start): 2–5s delay while shaders compile
- Mid-scene: Frame drops spike when new LOD levels load
- Smooth playback → stuttering after 30–60s
Debugging:
# Capture Metal frame statistics # In Xcode: Product > Scheme > Edit > Run > Diagnostics # Enable: Metal API Validation, GPU Frame Capture # Check shader compilation time python ~/.claude/skills/gsplat-optimizer/scripts/metal_profile.py \ --capture-shader-compile \ --target iphone14 # Monitor frame time distribution tail -f xcode.log | grep -E "frame_time|stutter"
Strategy:
- Pre-warm shader cache: Compile all function variants on first load (avoid runtime jank)
- Limit LOD transitions: If using multiple LOD levels, cap transitions to 2 per frame
- Asynchronous streaming: Load new geometry chunks on background thread, upload in-between frames
- Device-specific tuning:
- iPhone: Keep draw calls < 50, geometry per call < 500K gaussians
- Mac: More generous; aim for < 2M gaussians per draw call
- Vision Pro: Account for stereo; effective capacity is half the budget
- Profile regimen: Run Metal System Trace before and after each optimization; track:
- GPU utilization (target 70–85%)
- Shader time (target <10ms)
- Memory bandwidth (target <50GB/s)
Key Metrics
| Metric | Target | How to Measure |
|---|---|---|
| Frame time | 16.6ms (60fps) | Metal System Trace |
| GPU memory | < device budget | Xcode Memory Graph |
| Bandwidth | < 50GB/s | GPU Counters |
| Shader time | < 10ms | GPU Frame Capture |
Reference Implementation
MetalSplatter is the primary reference for Swift/Metal gaussian splatting:
- Repository: https://github.com/scier/MetalSplatter
- Supports iOS, macOS, visionOS
- ~8M splat capacity with v1.1 optimizations
- Stereo rendering for Vision Pro
Getting Started with MetalSplatter
git clone https://github.com/scier/MetalSplatter.git cd MetalSplatter open SampleApp/MetalSplatter_SampleApp.xcodeproj # Set to Release scheme for best performance
Resources
Reference Documentation
- Pruning Strategies - Gaussian reduction techniques
- LOD Schemes - Level-of-detail approaches
- Compression - Bandwidth/storage optimization
- Metal Profiling - Apple GPU optimization
Research Papers
- LODGE - LOD for large-scale scenes
- FLoD - Flexible LOD for variable hardware
- Voyager - City-scale mobile rendering
- 3DGS Compression Survey