What is Grok Video?
Grok Video is xAI's text-to-video model, now available inside Synclip. It generates 720p video at three aspect ratios, produces a thumbnail automatically alongside the clip, and supports a single reference image to anchor identity or scene style.
The model is designed for short-form cinematic output: anywhere from 6 seconds (a punchy loop or teaser) to 15 seconds (a full narrative beat). Unlike flat-rate models, Grok Video uses linear per-second billing so you only pay for what you actually generate.
- 720p output with auto-generated thumbnail
- 3:2 landscape, 2:3 portrait, 1:1 square aspect ratios
- Optional single reference image for character or scene consistency
- 6 s / 10 s / 15 s duration options
- Linear pricing: 3 coins / second (18 → 30 → 45 coins)
Aspect Ratios — Pick the Right Frame for Your Platform
Grok Video gives you three ratios, each optimised for a distinct distribution channel:
| Ratio | Format | Best for |
|---|---|---|
| 3:2 | Landscape / Cinematic | YouTube, film reels, desktop viewers |
| 2:3 | Portrait / Short-form | Reels, TikTok, Shorts, mobile-first feeds |
| 1:1 | Square / Social | Instagram posts, product ads, cross-platform reposts |
Pick your ratio before writing your prompt — the composition language of the prompt changes depending on orientation. For portrait, describe vertical movement; for landscape, use horizontal staging.
Duration & Pricing — Transparent, Linear Billing
Grok Video costs exactly 3 coins per second. No hidden tiers, no capacity surcharges:
| Duration | Grok Video | Veo 3.1 Fast | Sora 2 |
|---|---|---|---|
| 6 s | 18 coins | 18 coins (Veo 3.1 Fast, any length) | 8 coins (Sora 2, 10 s) |
| 10 s | 30 coins | 18 coins (Veo 3.1 Fast, any length) | 8 coins (Sora 2, 10 s) |
| 15 s | 45 coins | 18 coins (Veo 3.1 Fast, any length) | 12 coins (Sora 2, 15 s) |
Veo 3.1 Fast is a flat-rate model — the same coin cost regardless of duration. If you need the longest clip at the lowest coin spend, Veo 3.1 Fast wins on raw economics. Grok Video's advantage is cinematic quality at shorter durations and the reference-image workflow.
Reference Image — One Image, Consistent Results
Upload a single image alongside your prompt and Grok Video will use it to anchor the visual identity of the clip. This is the primary consistency tool for the model: character face/outfit, scene location, product look, or even a color palette can be locked with one reference.
- Character consistency across multiple generated clips
- Continuing a scene with the same background or location
- Product shots that must match an existing brand visual
- Locking a colour-grade or lighting style
Tip: Keep the reference image clean and representative. A single well-lit face or product on a neutral background gives the model the clearest signal. Avoid busy compositions with multiple focal points.
Four-Step Workflow
A repeatable sequence that works whether you're generating a one-off clip or building a short-form series.
Step 1 · Select Grok Video in the model picker
Open the Video Creator workspace in Synclip and choose Grok Video from the model dropdown. The interface will show the three aspect ratio options and the duration selector.
Step 2 · Write your prompt
Structure the prompt with five elements: subject, scene, camera move, motion beat, and style constraints. Keep it under 120 words. Avoid asking for readable text in-frame.
- Subject: who or what is in the shot
- Scene: environment and background
- Camera: shot type (close-up / medium / wide) and move (dolly / pan / orbit)
- Motion beat: what changes during the clip
- Style: realistic / cinematic / commercial / etc.
Step 3 · Set ratio, duration, and optional reference image
Pick the aspect ratio for your target platform. Choose 6 s for a loop or teaser, 10 s for a product beat, or 15 s for a full narrative moment. If you need visual consistency, upload one reference image before generating.
Step 4 · Generate and iterate
Run the generation. The model returns the video and an auto-generated thumbnail. If the shot direction is right but details need adjustment, tweak the motion beat or camera language and re-run — the reference image stays locked between iterations.
Prompt Templates — Copy, Replace, Generate
Replace the bracketed fields with your project specifics.
A) Landscape cinematic (3:2) — establishing shot
- YouTube intros
- Film-style b-roll
- Travel and destination content
B) Portrait short-form (2:3) — vertical character story
Tip: Pair with a reference image of the character for face consistency across clips.
C) Square social (1:1) — product reveal
- Instagram ads
- E-commerce product videos
- Brand content
Model Comparison — Grok Video vs Veo 3.1 Fast vs Sora 2
A quick reference for choosing the right model per use case:
| Feature | Grok Video | Veo 3.1 Fast | Sora 2 |
|---|---|---|---|
| Output resolution | 720p | 720p | 720p |
| Aspect ratios | 3:2 / 2:3 / 1:1 | 16:9 / 9:16 | 16:9 / 9:16 / 1:1 |
| Max duration | 15 s | 25 s | 15 s |
| Reference image | 1 image | Multiple (ingredients) | No |
| First / last frame | No | Yes (Veo 3.1) | No |
| Auto thumbnail | Yes | No | No |
| Pricing (15 s) | 45 coins | 18 coins (flat) | 12 coins |
FAQ
What resolution does Grok Video output?
720p. The model also generates a thumbnail image automatically alongside the video clip.
Can I use more than one reference image?
Grok Video currently supports a single reference image per generation. For multi-image reference workflows (ingredients-style), use Veo 3.1 in Synclip.
Why does 15 seconds cost more than Veo 3.1 Fast?
Grok Video uses linear per-second billing (3 coins/s), so longer clips cost proportionally more. Veo 3.1 Fast is flat-rate per generation regardless of duration. If coin efficiency at max duration is your priority, Veo 3.1 Fast is the better pick.
Can I use Grok Video for portrait (vertical) content?
Yes — the 2:3 aspect ratio is designed for vertical short-form platforms like Reels, TikTok, and Shorts.