Awesome-omni-skill troubleshooting
Common ComfyUI errors and fixes — OOM, missing nodes, dtype mismatches, black images, and debugging strategies
git clone https://github.com/diegosouzapw/awesome-omni-skill
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/development/troubleshooting" ~/.claude/skills/diegosouzapw-awesome-omni-skill-troubleshooting && rm -rf "$T"
skills/development/troubleshooting/SKILL.mdComfyUI Troubleshooting Guide
Error Diagnosis Strategy
When a workflow fails, follow this systematic approach:
- Get the error: Use
to retrieve the execution result with full tracebackget_history - Check logs: Use
with keyword filters likeget_logs
,"error"
,"warning""traceback" - Identify the failing node: The history response includes the
andnode_id
that failednode_type - Cross-reference inputs: Use
to verify the failing node's expected input schemaget_node_info - Check models: Use
to verify all referenced model files existlist_local_models
Out of Memory (OOM)
Error Pattern
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate X MiB. GPU 0 has a total capacity of 24.00 GiB of which X MiB is free.
Or:
RuntimeError: CUDA error: out of memory
Root Cause
The GPU does not have enough VRAM to hold the model weights, intermediate tensors, and latent images simultaneously. Common triggers:
- High resolution images (2048x2048+)
- Multiple models loaded simultaneously
- FP32 precision models on limited VRAM
- Video generation (LTXV, AnimateDiff) with many frames
- Large batch sizes
Fixes (in order of preference)
- Reduce resolution: Drop to the model's native resolution (512 for SD 1.5, 1024 for SDXL/Flux)
- Use FP8/FP16 quantized models: FP8 Flux models use ~8GB vs ~24GB for FP16
- Search for FP8 variants:
orsearch_models("flux fp8")search_models("sdxl fp8")
- Search for FP8 variants:
- Use
flag: ComfyUI CLI flag that offloads model parts to CPU during inference--lowvram - Free VRAM between generations: ComfyUI should auto-manage, but restarting clears leaked memory
- Use tiled VAE decoding: For high-resolution images, tile the VAE decode step
- Node:
instead ofVAEDecodeTiledVAEDecode - Breaks the image into tiles, decodes each separately, and stitches them together
- Node:
- Reduce batch size: Set batch_size to 1 in
EmptyLatentImage - Avoid multiple models: Don't load two full checkpoints simultaneously — use one checkpoint and LoRAs instead
- For LTXV/video: Always use FP8 quantized video models on 24GB cards
VRAM Estimates
| Model | FP32 | FP16 | FP8 |
|---|---|---|---|
| SD 1.5 | ~4GB | ~2GB | ~1GB |
| SDXL | ~12GB | ~6GB | ~3GB |
| Flux Dev | ~48GB | ~24GB | ~12GB |
| Flux Schnell | ~48GB | ~24GB | ~12GB |
| LTXV | ~20GB+ | ~10GB+ | ~6GB |
Device Mismatch
Error Pattern
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
Root Cause
A tensor on the CPU is being combined with a tensor on the GPU. This usually happens when:
- A custom node doesn't properly move tensors to the correct device
- Model offloading placed parts of the model on CPU
- A node produces CPU tensors while downstream expects GPU tensors
Fixes
- Check if the error occurs with a specific custom node — update or replace that node
- If using
or--lowvram
, some nodes may not support CPU offloading--cpu - Restart ComfyUI to reset device state
- Check if a custom node has a newer version that fixes device handling
Missing Nodes
Error Pattern
Cannot find node class 'NodeClassName'
Or in the execution response:
"error": {"type": "node_not_found", "message": "Cannot find node class 'X'"}
Root Cause
The workflow references a node type that is not installed. This happens when:
- A custom node pack is not installed
- A custom node pack is installed but failed to load (import error)
- The node was renamed or removed in a pack update
Fixes
- Search for the node pack:
search_custom_nodes("NodeClassName") - Install via ComfyUI Manager or the registry
- Check logs for import errors:
Import errors often reveal missing Python dependenciesget_logs(keyword="import") get_logs(keyword="error") - Install missing Python dependencies: If the custom node requires a pip package:
pip install missing-package - Restart ComfyUI after installing any custom node — nodes are loaded at startup
NaN Tensor Errors
Error Pattern
RuntimeError: Input contains NaN
Or images come out as solid gray/noise with NaN warnings in logs.
Root Cause
Numerical instability during the diffusion process. Common triggers:
- CFG scale too high: Values above 15-20 can cause numerical overflow
- Corrupted model weights: Damaged download or incompatible merge
- FP16 overflow: Some operations overflow at half precision
- Incompatible LoRA: A LoRA trained for a different base model
Fixes
- Lower CFG: Try CFG 7.0 for SD 1.5/SDXL, 1.0 for Flux
- Use FP32 VAE: Some VAEs produce NaN in FP16. Switch to
(FP32)vae-ft-mse-840000-ema-pruned.safetensors - Remove LoRAs: Test without LoRAs to isolate the cause
- Re-download the model: Hash verification can detect corrupted files
- Check LoRA compatibility: Ensure the LoRA matches the base model family
Dtype Mismatches
Error Pattern
RuntimeError: expected scalar type Float but found Half
Or:
RuntimeError: expected scalar type Half but found Float
Or:
RuntimeError: Input type (float) and bias type (c10::Half) should be the same
Root Cause
A model component expects one precision (FP32/FP16) but receives another. Most common with:
- VAE precision mismatch (FP16 model + FP32 VAE or vice versa)
- Mixed-precision LoRAs
- Custom nodes that force a specific dtype
Fixes
- Use a separate VAE: Load an explicit FP32 VAE instead of the checkpoint's built-in VAE
- Node:
withVAELoadervae-ft-mse-840000-ema-pruned.safetensors
- Node:
- Match precision: If the model is FP16, use FP16-compatible nodes throughout
- Force FP32 VAE decode: Some node packs offer
nodesVAEDecodeFP32 - Check ComfyUI settings:
flag forces everything to FP32 (uses more VRAM)--force-fp32
CLIP Token Overflow
Error Pattern
No explicit error — the prompt is silently truncated at 77 tokens, and details mentioned late in the prompt are ignored.
Symptoms
- Later parts of long prompts have no effect on the image
- Adding more descriptive text doesn't change the output
- Removing early tokens suddenly makes later tokens work
Fixes
- Use BREAK token: Split the prompt at natural boundaries:
subject description, pose, clothing, setting BREAK lighting, style, quality, camera angle - Use CLIPTextEncodeSDXL: SDXL's dual-CLIP processes two 77-token chunks
- Prioritize important tokens: Put the most important descriptors first
- Use fewer filler words: Remove articles and prepositions where possible
- Use embeddings: Condense complex concepts into single tokens with textual inversions
Black Images
Error Pattern
No error in the execution — the workflow "succeeds" but produces completely black or near-black images.
Root Causes and Fixes
| Cause | Diagnosis | Fix |
|---|---|---|
| Check KSampler inputs | Set denoise to 1.0 for txt2img, 0.5-0.8 for img2img |
| Check KSampler inputs | Set CFG to 7.0 (SD 1.5), 1.0 (Flux) |
| Check KSampler inputs | Set steps to 20+ (standard) or 4+ (turbo) |
| Wrong VAE | VAE doesn't match model | Use the correct VAE for the model family |
| Empty prompt | CLIPTextEncode has empty text | Add a text prompt |
| Wrong scheduler | Incompatible scheduler/sampler combo | Try scheduler with sampler |
| Seed collision | Extremely rare | Change the seed value |
| FP16 VAE overflow | VAE decode produces black | Use FP32 VAE or VAEDecodeTiled |
Quick Diagnostic Checklist
- Check
> 0 (should be 1.0 for txt2img)denoise - Check
> 0 (should be 7.0 for SD 1.5, 1.0 for Flux)cfg - Check
> 0 (should be 20 for standard, 4 for turbo)steps - Verify the positive prompt is not empty
- Try a different seed
- Try a known-working sampler/scheduler combo:
+eulernormal
Connection Type Errors
Error Pattern
Output type 'IMAGE' doesn't match input type 'LATENT'
Or:
Required input 'model' of type 'MODEL' but got connection of type 'CLIP'
Root Cause
Connecting the wrong output slot of a node to an incompatible input. Often caused by using the wrong output index.
Fixes
- Check output indices: Use
to verify the exact output orderget_node_info
outputs: 0=MODEL, 1=CLIP, 2=VAECheckpointLoaderSimple- Getting index wrong:
gives MODEL,["1", 0]
gives CLIP["1", 1]
- Verify connection format:
— node ID is a string, index is an integer["nodeId", outputIndex] - Check data type flow: Ensure the pipeline follows the correct type chain:
MODEL → KSampler CLIP → CLIPTextEncode → CONDITIONING → KSampler LATENT → KSampler → LATENT → VAEDecode → IMAGE VAE → VAEDecode, VAEEncode
Model Loading Errors
Error Pattern
FileNotFoundError: [Errno 2] No such file or directory: 'models/checkpoints/model.safetensors'
Or:
SafetensorError: Error reading file: invalid header
Or:
RuntimeError: PytorchStreamReader failed reading zip archive
Root Causes
- File not found: Model file doesn't exist at the referenced path
- Corrupted download: Incomplete or damaged file
- Wrong format: File is not a valid safetensors/pickle/checkpoint format
Fixes
- Verify the model exists:
list_local_models(model_type="checkpoints") - Check the exact filename: Model names in workflows must match the filename exactly (case-sensitive)
- Re-download: If hash mismatch or corruption:
download_model(url="...", target_subfolder="checkpoints") - Check file size: A 1KB safetensors file is clearly corrupted — re-download
- Verify subfolder: Models must be in the correct subfolder (
,checkpoints/
,loras/
, etc.)vae/
Torch / CUDA Version Errors
Error Pattern
RuntimeError: CUDA error: no kernel image is available for execution on the device
Or:
ImportError: cannot import name 'xxx' from 'torch'
Or:
AssertionError: Torch not compiled with CUDA enabled
Root Cause
PyTorch and CUDA version incompatibility, usually after:
- Updating PyTorch without matching CUDA toolkit
- Installing a custom node that downgrades/changes PyTorch
- Using pip install that pulls a CPU-only PyTorch
Fixes
- Check current versions:
get_system_stats() # Shows PyTorch version and CUDA version - Verify CUDA availability: In Python:
torch.cuda.is_available() - Reinstall PyTorch with CUDA: Visit pytorch.org for the correct install command matching your CUDA version
- Pin PyTorch version: After fixing, avoid running
commands that might change PyTorchpip install - Use ComfyUI's bundled venv: ComfyUI Desktop ships with a pre-configured Python environment
ComfyUI Desktop vs CLI Differences
Key Differences
| Aspect | ComfyUI Desktop | ComfyUI CLI |
|---|---|---|
| Default port | 8000 | 8188 |
| Python | Embedded (bundled) | System/venv Python |
| Install location | | Wherever you cloned it |
| Custom nodes | | in repo |
| Models | | in repo |
| Config | for shared paths | Same |
| Updates | Auto-updater in the app | |
Common Issues
- Wrong port: MCP tools default to 8188 — if using Desktop, configure for port 8000
- Path confusion: Desktop separates user data from application files
- Custom node pip installs: Desktop's embedded Python may not be on PATH — install within the venv
Error-Specific Debugging Commands
Workflow Failed — Get Details
get_history() # Most recent execution get_history(prompt_id="abc-123") # Specific execution
The response includes:
: "success" or "error"status.status_str
: Timestamped execution messagesstatus.messages
: Node outputs (images, etc.)outputs- Error traceback for failed nodes
Check Server Health
get_system_stats() # GPU info, VRAM, Python/PyTorch versions get_queue() # Running and pending jobs get_logs(max_lines=50, keyword="error") # Recent error logs
Verify Node Availability
get_node_info(node_type="KSampler") # Check specific node get_node_info(node_type="ControlNetApply") # Verify custom nodes loaded
Verify Models
list_local_models(model_type="checkpoints") # Installed checkpoints list_local_models(model_type="loras") # Installed LoRAs list_local_models(model_type="controlnet") # Installed ControlNets
Quick Reference: Error to Fix
| Error Message (partial) | Most Likely Fix |
|---|---|
| Reduce resolution, use FP8 model, |
| Update custom node, restart ComfyUI |
| Install the node pack, restart ComfyUI |
| Lower CFG, use FP32 VAE, remove LoRAs |
| Use FP32 VAE, or |
(model) | Check filename, re-download model |
(safetensors) | Re-download — file is corrupted |
| Reinstall PyTorch with matching CUDA version |
| Black images, no error | Check denoise > 0, cfg > 0, steps > 0, prompt not empty |
| Image looks garbled/noisy | Wrong model+VAE combo, wrong sampler settings |
on port 8188 | ComfyUI not running, or using Desktop (port 8000) |
| Node inputs don't match schema — check |