Comfy-agent curated_qwenvl_video_to_text

install

source · Clone the upstream repo

git clone https://github.com/steliosot/comfy-agent

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/steliosot/comfy-agent "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/workflows/video_t2v_i2v_avatar/curated_qwenvl_video_to_text" ~/.claude/skills/steliosot-comfy-agent-curated-qwenvl-video-to-text-b7958b && rm -rf "$T"

manifest: skills/workflows/video_t2v_i2v_avatar/curated_qwenvl_video_to_text/SKILL.md

source content

curated_qwenvl_video_to_text

Curated workflow skill generated from

QwenVL Video to Text.json

Capability Family

```
video_t2v_i2v_avatar
```

Inputs

Optional runtime overrides supported by

run(...)

```
prompt
```
```
negative_prompt
```
```
width
```
,
```
height
```
```
seed
```
,
```
steps
```
,
```
cfg
```
```
sampler_name
```
,
```
scheduler
```
,
```
denoise
```
```
server
```
,
```
headers
```
,
```
api_prefix
```

Outputs

Returns JSON with:
- ```
status
```
- ```
prompt_id
```
- ```
output_images
```
  (includes image/video entries reported by Comfy history)

Model Requirements

None detected from loader nodes.

Custom Node Requirements

```
comfyui-qwenvl
```
```
comfyui-videohelpersuite
```

Links Extracted From Workflow Notes

Source

Original:

comfy-data/workflows/QwenVL Video to Text.json

Routing Metadata

Family:
```
video_t2v_i2v_avatar
```
Input modalities:
```
video
```
Output modalities:
```
application/json
```
Model families:
```
qwen, wan
```
Node count:
```
5
```
Complexity score:
```
5
```
Resource profile:
```
medium
```

Estimated runtime:

moderate (about 30-120s depending on server)

Max latent resolution hint:
```
None
```
x
```
None
```
Max sampler steps hint:
```
None
```

Detected Models

None detected.

Detected Custom Nodes

```
comfyui-qwenvl
```
```
comfyui-videohelpersuite
```

Runtime Warnings

Uses custom nodes; missing nodes can cause validation/runtime failures.
Video workflow: usually slower and VRAM-intensive than still-image workflows.