OpenMontage grok-media

xAI Grok image and video generation guide covering authentication, endpoints, prompt structure, image editing, reference-image video, and async polling.

install
source · Clone the upstream repo
git clone https://github.com/calesthio/OpenMontage
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/calesthio/OpenMontage "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.agents/skills/grok-media" ~/.claude/skills/calesthio-openmontage-grok-media && rm -rf "$T"
manifest: .agents/skills/grok-media/SKILL.md
source content

Grok Media

Use this skill when working with xAI media models in OpenMontage.

Models

  • grok-imagine-image
    for image generation and image editing
  • grok-imagine-video
    for text-to-video, image-to-video, and reference-image video

Authentication

  • Env var:
    XAI_API_KEY
  • Base URL:
    https://api.x.ai/v1
  • Header:
    Authorization: Bearer $XAI_API_KEY

Image API

Text-to-image

  • Endpoint:
    POST /images/generations
  • Core fields:
    • model
    • prompt
    • n
    • aspect_ratio
    • resolution

Image edit

  • Endpoint:
    POST /images/edits
  • Use
    image
    for one source image
  • Use
    images
    for multi-image compositing
  • Each source image can be:
    • a public HTTPS URL
    • a base64 data URI

Image prompting

  • Grok responds well to direct natural language
  • For edits, describe only the intended change and preserve everything else implicitly
  • For multi-image merges, explicitly name how each source contributes
  • Prefer one strong scene description over long style-stacking

Video API

Generation

  • Endpoint:
    POST /videos/generations
  • Polling endpoint:
    GET /videos/{request_id}
  • Success state:
    status == "done"
  • Failure states to handle explicitly:
    failed
    ,
    expired

Modes

  • Text-to-video:
    • prompt-only generation
  • Image-to-video:
    • use
      image: {"url": ...}
    • this anchors the starting frame
  • Reference-to-video:
    • use
      reference_images: [{"url": ...}, ...]
    • this influences who/what appears in the video without locking the first frame
    • prompts can reference inputs with placeholders like
      <IMAGE_1>
      ,
      <IMAGE_2>

Video constraints

  • Grok video is best treated as short-form generation
  • Current output resolutions are
    480p
    and
    720p
  • Reference-image video supports multiple images and is useful for product placement, wardrobe transfer, and identity consistency
  • Download outputs promptly; provider URLs may be temporary

Pricing

  • grok-imagine-image
    :
    $0.02
    per generated image
  • grok-imagine-image
    edits/composites: add
    $0.002
    per input image
  • grok-imagine-video
    :
    • 480p
      :
      $0.05
      per second
    • 720p
      :
      $0.07
      per second
  • grok-imagine-video
    image-conditioned requests: add
    $0.002
    per input image

Grok-Specific Prompt Guidance

Images

  • Start with subject, action, setting
  • Add one style anchor, not five
  • For edits:
    • describe the desired modification
    • keep the rest of the image stable by omission, not by writing a giant preservation list

Video

  • Keep prompts scene-local: one shot, one main motion idea, one emotional beat
  • For reference-conditioned video, explicitly map source images to roles:
    • person from
      <IMAGE_1>
    • jacket from
      <IMAGE_2>
    • product from
      <IMAGE_3>
  • Camera and pacing language helps:
    • slow push-in
    • handheld follow
    • locked-off medium shot
    • high-energy whip pan transition

Good Fits

  • Image style transfer
  • Image compositing from multiple sources
  • Reference-conditioned short video
  • Product-led motion clips
  • Character-consistent scenes without hard first-frame lock

Weak Fits

  • Long-form clip generation
  • Heavy reliance on deterministic seeds
  • Overloaded prompts with multiple scene changes

Failure Handling

  • If generation submission succeeds but polling expires, surface it as a provider/runtime issue
  • If a request fails, preserve the endpoint, mode, and prompt summary in the error
  • Do not silently substitute a different provider after xAI was selected without user approval