OpenMontage seedance-2-0

install
source · Clone the upstream repo
git clone https://github.com/calesthio/OpenMontage
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/calesthio/OpenMontage "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.agents/skills/seedance-2-0" ~/.claude/skills/calesthio-openmontage-seedance-2-0 && rm -rf "$T"
manifest: .agents/skills/seedance-2-0/SKILL.md
source content

Seedance 2.0 (ByteDance)

Seedance 2.0 is the ByteDance Seed team's unified multimodal video+audio model (released Feb 2026, globally available via partner APIs April 2026). It is the preferred premium default for cinematic, trailer, teaser, and motion-led work inside OpenMontage whenever any supporting gateway is configured. OpenMontage wraps four gateways directly (

seedance_video
→ fal.ai,
seedance_replicate
→ Replicate,
runway_video
with
model="seedance_2.0"
→ Runway,
higgsfield_video
with
model="seedance_2.0"
→ Higgsfield); BytePlus / Freepik / HeyGen-Video-Agent wrappers are on the roadmap. The scoring engine deduplicates by
provider="seedance"
so whichever gateway the user has configured wins automatically — agents should pass
preferred_provider="seedance"
to
video_selector
(or let the scorer pick) rather than routing to a specific gateway by name.

Why it is the OpenMontage premium default

CapabilitySeedance 2.0Notes
Single-pass native synced audioYesSpeech + SFX + ambience generated jointly, not post-sync
Multi-shot inside one generationYesMultiple cuts/shots in a single prompt
Director-level camera controlYesCamera language (dolly, tilt, arc, crane, handheld) honored
Lip-sync from quoted dialogueYes
Character says: "..."
matches mouth shapes
Reference conditioningUp to 9 images + 3 video clips + 3 audio clips12-asset multimodal
Character identity consistencyYesFace/subject stable across shots
Max shot duration15 sauto / 4–15 s
Resolution ceiling1080p on some endpoints (720p default on fal.ai)Provider-dependent
Elo (Artificial Analysis)1269 (#1 as of Feb 2026)Beat Veo 3, Sora 2, Runway Gen-4.5

Switch away only for a specific reason: strict budget (use the

fast
variant or LTX), user-preferred provider (VEO/Sora/Kling), or a stylistic fit that favors another model.

Provider surfaces

SurfaceEnvOpenMontage toolStatusNotes
fal.ai (primary)
FAL_KEY
seedance_video
✅ wrappedModel IDs below. Supports T2V, I2V, reference-to-video;
standard
and
fast
variants. Default in OpenMontage.
Replicate
REPLICATE_API_TOKEN
seedance_replicate
✅ wrapped
bytedance/seedance-2.0
+
bytedance/seedance-2.0-fast
. Standard Replicate prediction API.
Runway
RUNWAY_API_KEY
runway_video
(model:
seedance_2.0
)
✅ wrappedThird-party Seedance 2.0 model inside Runway. Unlimited/Enterprise plans, non-US only. Selected via
model
param.
Higgsfield
HIGGSFIELD_API_KEY
+
_SECRET
higgsfield_video
(model:
seedance_2.0
)
✅ wrappedSeedance 2.0 is the default model on this tool. Emphasis on character identity + long-form chaining.
HeyGen
HEYGEN_API_KEY
heygen_video
(1.x only) + TODO
⚠️ 1.x onlyThe
seedance_pro
/
seedance_lite
workflow provider strings on HeyGen map to Seedance 1.x. 2.0 access flows through Video Agent / Avatar Shots endpoints — a separate
seedance_heygen
tool is on the roadmap.
BytePlus ModelArk / VolcengineBytePlus tokennot wrapped🔜 roadmapDirect from ByteDance. Pro ~$0.15 / 5 s, Lite ~$0.010/s. Token-based.
FreepikFreepik tokennot wrapped🔜 roadmap
POST /v1/ai/image-to-video/seedance-pro-1080p
for 1080p I2V
Pollo / PiAPI / Atlas Cloud / AIMLAPIvariousnot wrapped🔜 roadmapAggregators resell fal.ai or ByteDance endpoints

fal.ai model IDs (used by
seedance_video
)

bytedance/seedance-2.0/text-to-video
bytedance/seedance-2.0/image-to-video
bytedance/seedance-2.0/reference-to-video        # 9 img + 3 vid + 3 audio
bytedance/seedance-2.0/fast/text-to-video
bytedance/seedance-2.0/fast/image-to-video
bytedance/seedance-2.0/fast/reference-to-video

Pricing (fal.ai, 720p): standard $0.3034 / s (T2V), $0.3024 / s (I2V). Fast $0.2419 / s across endpoints. The

fast
variant trades some camera/motion fidelity for latency and cost — do not route slow-mo, multi-shot, or dolly-heavy prompts to
fast
on the first try.

Calling Seedance 2.0 inside OpenMontage

Always go through

video_selector
with
preferred_provider="seedance"
(or let the scoring engine pick it):

from tools.tool_registry import registry
registry.ensure_discovered()
selector = registry.get("video_selector")
result = selector.execute({
    "prompt": PROMPT,
    "preferred_provider": "seedance",
    "operation": "text_to_video",       # or image_to_video / reference_to_video
    "aspect_ratio": "21:9",             # 21:9 / 16:9 / 9:16 / 4:3 / 1:1 / 3:4
    "duration": "10",                   # auto / 4..15
    "resolution": "720p",               # 480p / 720p
    "output_path": "projects/<proj>/assets/video/clip_01.mp4",
})

Direct call to the provider tool (only when you must bypass the selector):

seedance = registry.get("seedance_video")
seedance.execute({
    "prompt": PROMPT,
    "model_variant": "standard",   # "standard" or "fast"
    "operation": "text_to_video",
    "aspect_ratio": "21:9",
    "duration": "10",
    "resolution": "720p",
    "generate_audio": True,
    "seed": 12345,                 # optional, for reproducible variations
    "output_path": "...",
})

Prompt structure

Seedance 2.0 is unusually literal about camera language, multi-shot cuts, and quoted dialogue. Use this 8-part template:

[Shot / framing] + [Camera movement] +
[Subject description — physical detail that must persist across shots] +
[Action beat 1] → [optional cut] → [Action beat 2] +
[Setting / environment] + [Lighting / palette] +
[Style / grade / era] + [Audio — ambient, diegetic, music, dialogue]

Multi-shot inside one generation

Seedance honors explicit shot lists inside a prompt. Format each shot:

Shot 1 (wide establishing, slow aerial push-in): ...
Shot 2 (medium close-up, handheld): ...
Shot 3 (extreme close-up, rack focus): ...

Keep subject description consistent across shots for identity stability.

Lip-sync from quoted dialogue

Aang stands on the cliff edge, staff raised, wind in his cloak.
Aang says: "I won't run anymore."
Sokka, half a step behind, replies: "Then we fight."

Use

Character says: "..."
/
Character replies: "..."
exactly — mouth shapes key off quoted strings. Keep each line under ~6 words; longer lines risk drift on fast clips.

Audio cues that work

Ambient:

distant thunder rolling over mountains
,
wind through reeds
,
crackling campfire
Diegetic:
boots crunching snow
,
staff planting on stone
,
wingbeats overhead
Music direction (light touch only):
low orchestral swell building
,
taiko drums entering on Shot 3
Do not request complex multi-instrument scores — keep music language textural.

Reference-to-video

When you have character / product / wardrobe references, use the reference-to-video endpoint and name each asset in the prompt:

Reference 1: hero character (Aang) — bald, blue arrow tattoo, orange robes.
Reference 2: environment plate — snowy Air Temple courtyard at dawn.
Shot 1: Aang (from reference 1) walks across the courtyard (reference 2),
wind lifting his robes. Low-angle tracking shot, slow push-in.

Parameter guidance

ParameterGuidance
duration
5
8
for hero shots,
10
12
for full scenes with multi-shot cuts,
4
for quick inserts.
auto
when unsure.
aspect_ratio
21:9
for cinematic trailers,
16:9
for broadcast / YouTube,
9:16
for Reels/Shorts/TikTok
resolution
720p
default. Drop to
480p
for cost-capped batch previews, not for finals
generate_audio
Keep on unless you have a specific reason to mute — Seedance's moat is synced audio. Strip audio downstream in compose if needed.
model_variant
standard
for hero/cinematic shots;
fast
only for b-roll, previews, or when latency is the hard constraint
seed
Set a seed before iterating variants of a chosen shot — everything else held constant

What to avoid

Don'tWhy
Cram four-plus simultaneous character actions into one shotMotion coherence breaks; split into multi-shot
Request readable text / logos inside the clipText rendering is unreliable — handle text in Remotion overlay
Mix conflicting lighting ("bright noon" + "neon night")Model picks one and ignores the other
Write dialogue longer than ~6 words on fast-cut shotsLip-sync drift
Use
fast
variant for slow-mo, multi-shot, or complex camera moves
Routinely misses on first try — route to
standard
Generate music through Seedance audioTexture-only is fine; for real scoring use
music
/
pixabay_music
/
elevenlabs
and mix in compose
Bypass
video_selector
without a reason
Loses cost/availability/fallback handling and scoring context

Iteration strategy

  1. Block out shape with a single
    duration=5
    fast
    T2V pass at the intended framing. Confirm the composition works.
  2. Lock the seed once the composition reads.
  3. Upgrade to
    standard
    with the same seed, tighten camera and lighting language.
  4. Extend and add shots — move to multi-shot or longer duration only after a single-shot version is clean.
  5. Keep a per-clip README with prompt + seed + variant for every shot that makes the cut, so the compose stage can re-render consistent retakes.

Integration notes for OpenMontage pipelines

  • Cinematic pipeline: Seedance 2.0 is the default video model. Use 21:9 for hero, multi-shot for montage beats, reference-to-video when the brief has a visual bible.
  • Animated explainer: Use Seedance 2.0 for the establishing / mood clips only; most shots should stay in Remotion. Don't replace Remotion motion graphics with Seedance — different tool, different job.
  • Screen demo / podcast / clip factory: Seedance is not the right default — these are footage-led. Only use for stylized cold-opens.
  • Cost discipline:
    standard
    at 10 s ≈ $3.03 per clip. Budget accordingly in the proposal stage.
    fast
    at 5 s ≈ $1.21 for previews.

Verification checklist for every Seedance shot

  • Motion reads coherently at the chosen shot length
  • Audio is actually synced (check dialogue + foot/impact hits)
  • Character identity matches reference / prior shots
  • Camera direction matches the prompt (no auto-dolly when you asked for static)
  • No readable text the model tried to render
  • Grade matches the approved style playbook
  • Output duration matches what you requested (some endpoints round)

Sources