Awesome-omni-skill troubleshooting

Common ComfyUI errors and fixes — OOM, missing nodes, dtype mismatches, black images, and debugging strategies

install
source · Clone the upstream repo
git clone https://github.com/diegosouzapw/awesome-omni-skill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/development/troubleshooting" ~/.claude/skills/diegosouzapw-awesome-omni-skill-troubleshooting && rm -rf "$T"
manifest: skills/development/troubleshooting/SKILL.md
source content

ComfyUI Troubleshooting Guide

Error Diagnosis Strategy

When a workflow fails, follow this systematic approach:

  1. Get the error: Use
    get_history
    to retrieve the execution result with full traceback
  2. Check logs: Use
    get_logs
    with keyword filters like
    "error"
    ,
    "warning"
    ,
    "traceback"
  3. Identify the failing node: The history response includes the
    node_id
    and
    node_type
    that failed
  4. Cross-reference inputs: Use
    get_node_info
    to verify the failing node's expected input schema
  5. Check models: Use
    list_local_models
    to verify all referenced model files exist

Out of Memory (OOM)

Error Pattern

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate X MiB.
GPU 0 has a total capacity of 24.00 GiB of which X MiB is free.

Or:

RuntimeError: CUDA error: out of memory

Root Cause

The GPU does not have enough VRAM to hold the model weights, intermediate tensors, and latent images simultaneously. Common triggers:

  • High resolution images (2048x2048+)
  • Multiple models loaded simultaneously
  • FP32 precision models on limited VRAM
  • Video generation (LTXV, AnimateDiff) with many frames
  • Large batch sizes

Fixes (in order of preference)

  1. Reduce resolution: Drop to the model's native resolution (512 for SD 1.5, 1024 for SDXL/Flux)
  2. Use FP8/FP16 quantized models: FP8 Flux models use ~8GB vs ~24GB for FP16
    • Search for FP8 variants:
      search_models("flux fp8")
      or
      search_models("sdxl fp8")
  3. Use
    --lowvram
    flag
    : ComfyUI CLI flag that offloads model parts to CPU during inference
  4. Free VRAM between generations: ComfyUI should auto-manage, but restarting clears leaked memory
  5. Use tiled VAE decoding: For high-resolution images, tile the VAE decode step
    • Node:
      VAEDecodeTiled
      instead of
      VAEDecode
    • Breaks the image into tiles, decodes each separately, and stitches them together
  6. Reduce batch size: Set batch_size to 1 in
    EmptyLatentImage
  7. Avoid multiple models: Don't load two full checkpoints simultaneously — use one checkpoint and LoRAs instead
  8. For LTXV/video: Always use FP8 quantized video models on 24GB cards

VRAM Estimates

ModelFP32FP16FP8
SD 1.5~4GB~2GB~1GB
SDXL~12GB~6GB~3GB
Flux Dev~48GB~24GB~12GB
Flux Schnell~48GB~24GB~12GB
LTXV~20GB+~10GB+~6GB

Device Mismatch

Error Pattern

RuntimeError: Expected all tensors to be on the same device, but found at least
two devices, cuda:0 and cpu!

Root Cause

A tensor on the CPU is being combined with a tensor on the GPU. This usually happens when:

  • A custom node doesn't properly move tensors to the correct device
  • Model offloading placed parts of the model on CPU
  • A node produces CPU tensors while downstream expects GPU tensors

Fixes

  1. Check if the error occurs with a specific custom node — update or replace that node
  2. If using
    --lowvram
    or
    --cpu
    , some nodes may not support CPU offloading
  3. Restart ComfyUI to reset device state
  4. Check if a custom node has a newer version that fixes device handling

Missing Nodes

Error Pattern

Cannot find node class 'NodeClassName'

Or in the execution response:

"error": {"type": "node_not_found", "message": "Cannot find node class 'X'"}

Root Cause

The workflow references a node type that is not installed. This happens when:

  • A custom node pack is not installed
  • A custom node pack is installed but failed to load (import error)
  • The node was renamed or removed in a pack update

Fixes

  1. Search for the node pack:
    search_custom_nodes("NodeClassName")
    
  2. Install via ComfyUI Manager or the registry
  3. Check logs for import errors:
    get_logs(keyword="import")
    get_logs(keyword="error")
    
    Import errors often reveal missing Python dependencies
  4. Install missing Python dependencies: If the custom node requires a pip package:
    pip install missing-package
    
  5. Restart ComfyUI after installing any custom node — nodes are loaded at startup

NaN Tensor Errors

Error Pattern

RuntimeError: Input contains NaN

Or images come out as solid gray/noise with NaN warnings in logs.

Root Cause

Numerical instability during the diffusion process. Common triggers:

  • CFG scale too high: Values above 15-20 can cause numerical overflow
  • Corrupted model weights: Damaged download or incompatible merge
  • FP16 overflow: Some operations overflow at half precision
  • Incompatible LoRA: A LoRA trained for a different base model

Fixes

  1. Lower CFG: Try CFG 7.0 for SD 1.5/SDXL, 1.0 for Flux
  2. Use FP32 VAE: Some VAEs produce NaN in FP16. Switch to
    vae-ft-mse-840000-ema-pruned.safetensors
    (FP32)
  3. Remove LoRAs: Test without LoRAs to isolate the cause
  4. Re-download the model: Hash verification can detect corrupted files
  5. Check LoRA compatibility: Ensure the LoRA matches the base model family

Dtype Mismatches

Error Pattern

RuntimeError: expected scalar type Float but found Half

Or:

RuntimeError: expected scalar type Half but found Float

Or:

RuntimeError: Input type (float) and bias type (c10::Half) should be the same

Root Cause

A model component expects one precision (FP32/FP16) but receives another. Most common with:

  • VAE precision mismatch (FP16 model + FP32 VAE or vice versa)
  • Mixed-precision LoRAs
  • Custom nodes that force a specific dtype

Fixes

  1. Use a separate VAE: Load an explicit FP32 VAE instead of the checkpoint's built-in VAE
    • Node:
      VAELoader
      with
      vae-ft-mse-840000-ema-pruned.safetensors
  2. Match precision: If the model is FP16, use FP16-compatible nodes throughout
  3. Force FP32 VAE decode: Some node packs offer
    VAEDecodeFP32
    nodes
  4. Check ComfyUI settings:
    --force-fp32
    flag forces everything to FP32 (uses more VRAM)

CLIP Token Overflow

Error Pattern

No explicit error — the prompt is silently truncated at 77 tokens, and details mentioned late in the prompt are ignored.

Symptoms

  • Later parts of long prompts have no effect on the image
  • Adding more descriptive text doesn't change the output
  • Removing early tokens suddenly makes later tokens work

Fixes

  1. Use BREAK token: Split the prompt at natural boundaries:
    subject description, pose, clothing, setting
    BREAK
    lighting, style, quality, camera angle
    
  2. Use CLIPTextEncodeSDXL: SDXL's dual-CLIP processes two 77-token chunks
  3. Prioritize important tokens: Put the most important descriptors first
  4. Use fewer filler words: Remove articles and prepositions where possible
  5. Use embeddings: Condense complex concepts into single tokens with textual inversions

Black Images

Error Pattern

No error in the execution — the workflow "succeeds" but produces completely black or near-black images.

Root Causes and Fixes

CauseDiagnosisFix
denoise = 0
Check KSampler inputsSet denoise to 1.0 for txt2img, 0.5-0.8 for img2img
cfg = 0
Check KSampler inputsSet CFG to 7.0 (SD 1.5), 1.0 (Flux)
steps = 0
Check KSampler inputsSet steps to 20+ (standard) or 4+ (turbo)
Wrong VAEVAE doesn't match modelUse the correct VAE for the model family
Empty promptCLIPTextEncode has empty textAdd a text prompt
Wrong schedulerIncompatible scheduler/sampler comboTry
"normal"
scheduler with
"euler"
sampler
Seed collisionExtremely rareChange the seed value
FP16 VAE overflowVAE decode produces blackUse FP32 VAE or VAEDecodeTiled

Quick Diagnostic Checklist

  1. Check
    denoise
    > 0 (should be 1.0 for txt2img)
  2. Check
    cfg
    > 0 (should be 7.0 for SD 1.5, 1.0 for Flux)
  3. Check
    steps
    > 0 (should be 20 for standard, 4 for turbo)
  4. Verify the positive prompt is not empty
  5. Try a different seed
  6. Try a known-working sampler/scheduler combo:
    euler
    +
    normal

Connection Type Errors

Error Pattern

Output type 'IMAGE' doesn't match input type 'LATENT'

Or:

Required input 'model' of type 'MODEL' but got connection of type 'CLIP'

Root Cause

Connecting the wrong output slot of a node to an incompatible input. Often caused by using the wrong output index.

Fixes

  1. Check output indices: Use
    get_node_info
    to verify the exact output order
    • CheckpointLoaderSimple
      outputs: 0=MODEL, 1=CLIP, 2=VAE
    • Getting index wrong:
      ["1", 0]
      gives MODEL,
      ["1", 1]
      gives CLIP
  2. Verify connection format:
    ["nodeId", outputIndex]
    — node ID is a string, index is an integer
  3. Check data type flow: Ensure the pipeline follows the correct type chain:
    MODEL → KSampler
    CLIP → CLIPTextEncode → CONDITIONING → KSampler
    LATENT → KSampler → LATENT → VAEDecode → IMAGE
    VAE → VAEDecode, VAEEncode
    

Model Loading Errors

Error Pattern

FileNotFoundError: [Errno 2] No such file or directory: 'models/checkpoints/model.safetensors'

Or:

SafetensorError: Error reading file: invalid header

Or:

RuntimeError: PytorchStreamReader failed reading zip archive

Root Causes

  • File not found: Model file doesn't exist at the referenced path
  • Corrupted download: Incomplete or damaged file
  • Wrong format: File is not a valid safetensors/pickle/checkpoint format

Fixes

  1. Verify the model exists:
    list_local_models(model_type="checkpoints")
  2. Check the exact filename: Model names in workflows must match the filename exactly (case-sensitive)
  3. Re-download: If hash mismatch or corruption:
    download_model(url="...", target_subfolder="checkpoints")
    
  4. Check file size: A 1KB safetensors file is clearly corrupted — re-download
  5. Verify subfolder: Models must be in the correct subfolder (
    checkpoints/
    ,
    loras/
    ,
    vae/
    , etc.)

Torch / CUDA Version Errors

Error Pattern

RuntimeError: CUDA error: no kernel image is available for execution on the device

Or:

ImportError: cannot import name 'xxx' from 'torch'

Or:

AssertionError: Torch not compiled with CUDA enabled

Root Cause

PyTorch and CUDA version incompatibility, usually after:

  • Updating PyTorch without matching CUDA toolkit
  • Installing a custom node that downgrades/changes PyTorch
  • Using pip install that pulls a CPU-only PyTorch

Fixes

  1. Check current versions:
    get_system_stats()  # Shows PyTorch version and CUDA version
    
  2. Verify CUDA availability: In Python:
    torch.cuda.is_available()
  3. Reinstall PyTorch with CUDA: Visit pytorch.org for the correct install command matching your CUDA version
  4. Pin PyTorch version: After fixing, avoid running
    pip install
    commands that might change PyTorch
  5. Use ComfyUI's bundled venv: ComfyUI Desktop ships with a pre-configured Python environment

ComfyUI Desktop vs CLI Differences

Key Differences

AspectComfyUI DesktopComfyUI CLI
Default port80008188
PythonEmbedded (bundled)System/venv Python
Install location
AppData/Local/Programs/ComfyUI/
Wherever you cloned it
Custom nodes
Documents/ComfyUI/custom_nodes/
./custom_nodes/
in repo
Models
Documents/ComfyUI/models/
./models/
in repo
Config
extra_model_paths.yaml
for shared paths
Same
UpdatesAuto-updater in the app
git pull

Common Issues

  • Wrong port: MCP tools default to 8188 — if using Desktop, configure for port 8000
  • Path confusion: Desktop separates user data from application files
  • Custom node pip installs: Desktop's embedded Python may not be on PATH — install within the venv

Error-Specific Debugging Commands

Workflow Failed — Get Details

get_history()                           # Most recent execution
get_history(prompt_id="abc-123")        # Specific execution

The response includes:

  • status.status_str
    : "success" or "error"
  • status.messages
    : Timestamped execution messages
  • outputs
    : Node outputs (images, etc.)
  • Error traceback for failed nodes

Check Server Health

get_system_stats()    # GPU info, VRAM, Python/PyTorch versions
get_queue()           # Running and pending jobs
get_logs(max_lines=50, keyword="error")  # Recent error logs

Verify Node Availability

get_node_info(node_type="KSampler")              # Check specific node
get_node_info(node_type="ControlNetApply")        # Verify custom nodes loaded

Verify Models

list_local_models(model_type="checkpoints")       # Installed checkpoints
list_local_models(model_type="loras")             # Installed LoRAs
list_local_models(model_type="controlnet")        # Installed ControlNets

Quick Reference: Error to Fix

Error Message (partial)Most Likely Fix
CUDA out of memory
Reduce resolution, use FP8 model,
--lowvram
Expected all tensors on same device
Update custom node, restart ComfyUI
Cannot find node class
Install the node pack, restart ComfyUI
Input contains NaN
Lower CFG, use FP32 VAE, remove LoRAs
expected scalar type Float but found Half
Use FP32 VAE, or
--force-fp32
No such file or directory
(model)
Check filename, re-download model
invalid header
(safetensors)
Re-download — file is corrupted
CUDA error: no kernel image
Reinstall PyTorch with matching CUDA version
Black images, no errorCheck denoise > 0, cfg > 0, steps > 0, prompt not empty
Image looks garbled/noisyWrong model+VAE combo, wrong sampler settings
Connection refused
on port 8188
ComfyUI not running, or using Desktop (port 8000)
Prompt outputs failed validation
Node inputs don't match schema — check
get_node_info