Comfy-agent curated_qwenvl_video_to_text
install
source · Clone the upstream repo
git clone https://github.com/steliosot/comfy-agent
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/steliosot/comfy-agent "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/workflows/video_t2v_i2v_avatar/curated_qwenvl_video_to_text" ~/.claude/skills/steliosot-comfy-agent-curated-qwenvl-video-to-text-b7958b && rm -rf "$T"
manifest:
skills/workflows/video_t2v_i2v_avatar/curated_qwenvl_video_to_text/SKILL.mdsource content
curated_qwenvl_video_to_text
Curated workflow skill generated from
QwenVL Video to Text.json.
Capability Family
video_t2v_i2v_avatar
Inputs
- Optional runtime overrides supported by
:run(...)promptnegative_prompt
,widthheight
,seed
,stepscfg
,sampler_name
,schedulerdenoise
,server
,headersapi_prefix
Outputs
- Returns JSON with:
statusprompt_id
(includes image/video entries reported by Comfy history)output_images
Model Requirements
- None detected from loader nodes.
Custom Node Requirements
comfyui-qwenvlcomfyui-videohelpersuite
Links Extracted From Workflow Notes
- https://discord.com/invite/gggpkVgBf3
- https://github.com/1038lab/ComfyUI-QwenVL
- https://www.youtube.com/@pixaroma
Source
- Original:
comfy-data/workflows/QwenVL Video to Text.json
Routing Metadata
- Family:
video_t2v_i2v_avatar - Input modalities:
video - Output modalities:
application/json - Model families:
qwen, wan - Node count:
5 - Complexity score:
5 - Resource profile:
medium - Estimated runtime:
moderate (about 30-120s depending on server) - Max latent resolution hint:
xNoneNone - Max sampler steps hint:
None
Detected Models
- None detected.
Detected Custom Nodes
comfyui-qwenvlcomfyui-videohelpersuite
Runtime Warnings
- Uses custom nodes; missing nodes can cause validation/runtime failures.
- Video workflow: usually slower and VRAM-intensive than still-image workflows.