What is Seedance 2.0?
Seedance 2.0 is a multimodal AI video model that supports joint audio and video generation with strong natural language understanding. Unlike earlier generation models that focused narrowly on text-to-video, Seedance 2.0 accepts text, images, video clips, and audio as input — and lets you control all of them from a single prompt.
Many creators find that their results are inconsistent not because the model is weak, but because their prompts are ambiguous. Once you learn the core structure, your generation quality improves dramatically.
- Text-to-Video (T2V) — generate from a written description
- Image-to-Video (I2V) — animate a reference image or photo
- Reference-to-Video (R2V) — guide style, character, or scene from multiple images
- Video-to-Video (V2V) — use an existing video as a reference or starting point
- Subtitle, slogan, and speech bubble generation
- Video editing, extension, and track merging
The Core Prompt Formula
Whether you're generating from text, image, or video, every strong Seedance 2.0 prompt starts with three elements:
This is the minimum viable prompt structure. Everything else is layered on top.
| Element | What it covers |
|---|---|
| Subject | Who or what is in the scene — person, object, creature, or abstract element. |
| Environment | Where the scene takes place — location, time of day, weather, lighting mood. |
| Action | What the subject is doing — movement, expression, interaction, or state change. |
For production-quality output, extend the base formula with camera, style, audio, and reference instructions. Each layer adds precision and reduces ambiguity — which means fewer wasted generations.
What Works vs. What Doesn't
Write this
- State clearly who the subject is
- Describe the scene with concrete details
- Specify the action step by step
- Name the camera movement explicitly
- Number your reference images (Image 1, Image 2…)
- Assign one reference to one role only
- Specify when and where text should appear
Avoid this
- "Generate a cinematic high-quality video" (no subject or action)
- Stacking multiple vague adjectives
- Mixing character, logo, scene, and action in a single sentence
- Uploading multiple references without labeling them
- "Make it feel futuristic" without visual anchors
Text-to-Video Prompts
In T2V mode, the quality of your description determines everything. Start with subject, environment, and action — then add camera and style to refine the output.
Basic T2V prompt
Clear subject, defined environment, specific action:
A young woman stands on a rocky shoreline at dawn. Her long hair is lifted by the wind. Morning light hits her face from the side. The camera slowly pulls back from a medium shot to a wide establishing shot. Warm, cinematic, naturalistic style.
💡 Avoid "a beautiful video of…" — describe what the viewer actually sees.
Action-driven T2V prompt
When the movement itself is the focus:
A lone figure in a white trench coat walks down a rain-soaked neon-lit alley at night. She slows her pace and turns to look directly at the camera. Camera tracks from behind, then cuts to a close-up of her face. Cyberpunk atmosphere, shallow depth of field.
Product / brand T2V prompt
For advertising and commercial content:
A sleek black glass bottle sits on a dark marble surface. Water droplets form on the exterior. The camera orbits the bottle slowly in a 360-degree arc. Low ambient lighting with a single warm spotlight. Minimalist luxury aesthetic.
Image-to-Video & Reference Prompts
When you upload reference images, the key is not just uploading — it's telling the model what each image controls. Number your references and assign each one a specific role.
Single-image reference
Control character appearance from one reference photo:
Using the character in Image 1, generate a scene of her running along a sunlit beach at golden hour. Maintain her hairstyle, clothing, and facial features exactly as shown. Camera follows from behind at medium distance, gradually moving to a side profile.
💡 Say "maintain X from Image 1" — never assume the model will infer what to keep.
Multi-image reference
When you want to control character, logo, and scene separately:
Reference Image 1 for the character (the woman in the red dress). Reference Image 2 for the brand logo (bottom-right corner, small). Reference Image 3 for the scene environment (the glass-and-steel lobby interior). Generate a cinematic brand reveal where she walks into the lobby and the logo appears at the end.
💡 Each image = one job. Never ask one image to control two different things.
First-frame / last-frame control
Lock the start and end of your clip using reference images:
Generate a video that begins with the composition in Image 1 (close-up of hands holding a coffee cup) and ends with the composition in Image 2 (wide shot of the café window with rain outside). Fill the transition naturally with a slow camera pull-back. Morning light, quiet café atmosphere.
Generating Subtitles, Slogans & Speech Bubbles
Seedance 2.0 supports on-screen text generation — but you need to be specific about content, timing, position, and style.
Slogan / brand tagline
Template: Text content + When it appears + Where in frame + How it appears + Font style
At the end of the video, display the brand tagline "Live Brighter" centered in the frame. Text fades in smoothly. Silver, futuristic sans-serif style.
Synchronized subtitles
Template: Subtitle content + Sync requirement + Position + Style
Display subtitles at the bottom of the frame, fully synchronized with the voiceover. Clean, legible sans-serif. White text with a soft drop shadow.
Speech bubbles / dialogue callouts
Template: Character action + Dialogue content + Speech bubble style
The girl smiles and says "We're going to make it." A speech bubble appears next to her containing the dialogue text. Bubble style is soft white with a thin border.
Video Reference Prompts
Video references let you copy motion patterns, camera movements, or visual effects from existing footage — without describing every detail from scratch.
Motion reference
Copy a specific movement from a reference clip:
Using the body movement from Video 1 as reference, generate a new character performing the same spinning jump in a forest clearing at dusk. Maintain the timing and arc of the motion.
💡 Label "Video 1", "Video 2" — the same way you label image references.
Camera movement reference
Replicate a specific camera technique from a reference clip:
Adopt the camera movement from Video 2 (slow orbital pan with slight upward tilt) and apply it to a city skyline at night with light trails. Preserve the pacing and rhythm of the original camera path.
Visual effect reference
Transfer a specific effect or aesthetic:
Reference the particle trails and light burst effect from Video 1. Apply this to a new scene of a figure running through a dark corridor. Maintain the same intensity and timing of the effect.
Video Editing with Prompts
Seedance 2.0 is not just a generation tool — it supports natural language video editing. You can add, remove, replace, extend, or merge video elements using prompts.
Add / remove / replace elements
- In Video 1, add a drone hovering in the upper-right sky area, appearing from frame 3 onward.
- Remove the billboard in the background of Video 1. Keep all other elements unchanged.
- Replace the red sports car in Video 1 with a silver concept vehicle. Maintain all camera angles and movement.
Extend an existing video
- Extend Video 1 by 5 seconds. Continue the character's walking motion and maintain the same lighting and scene.
- Generate what happens after Video 1 ends. The character continues forward and turns the corner.
Merge and transition between clips
- Sequence: Video 1 → 2-second city flyover transition → Video 2 → light-flare cut → Video 3. Maintain continuous visual flow.
5 Rules for Consistent Output
If your Seedance 2.0 results feel unstable or unpredictable, these five rules will help.
- Define your primary goal first — Decide before writing: are you optimizing for character consistency, a specific camera move, brand logo placement, or text generation? Your prompt structure should prioritize that goal above everything else.
- Always number your references — Use "Image 1, Image 2, Video 1, Video 2" — not "the reference image" or "the uploaded video". Explicit labels eliminate ambiguity when multiple assets are involved.
- One asset, one role — Assign each reference image or video clip exactly one job: character, logo, scene, motion style, or camera movement. Never ask one asset to control two different things simultaneously.
- Use common characters for on-screen text — Text generation is more stable when you use standard characters. Avoid rare glyphs, stylized ligatures, or complex typographic combinations in subtitle or slogan prompts.
- Write like a director, not a wish-maker — Replace "make it feel cinematic" with exact instructions: who is in frame, where they are, what they do, how the camera moves, and what the audience hears. Specificity is what separates stable output from random results.
Frequently Asked Questions
How do I write a Seedance 2.0 prompt?
Start with the core formula: Subject + Environment + Action. Then extend with camera movement, visual style, audio cues, on-screen text, and reference sources. The more specific each layer, the more consistent your results.
What is the best AI video prompt template?
The universal template is: subject description + scene description + action description + camera language + style description + audio description + text requirements + reference source notes. Adapt it to your use case.
How do I use multiple reference images in Seedance 2.0?
Number each reference (Image 1, Image 2, Image 3) and assign each one a single specific role — for example: Image 1 controls the character, Image 2 controls the logo, Image 3 controls the scene environment.
How do I generate subtitles or slogans with Seedance 2.0?
Specify the text content, when it appears (start/end/mid-video), where in the frame, and how it appears (fade-in, pop-in, etc.). The more concrete the instruction, the more accurately it renders.
How do I use video reference prompts in Seedance 2.0?
Label your video clip (Video 1, Video 2) and clearly state what you are referencing from it: a specific motion pattern, a camera movement style, or a visual effect. Never upload a video reference without naming what it controls.
Why are my Seedance 2.0 results inconsistent?
The most common cause is vague or incomplete prompts. Check: is your subject clearly defined? Is each reference numbered and assigned one role? Are your camera and text instructions specific? Apply the five stability rules in this guide to improve consistency.