Workflow

Text to lipsync video workflow: script, voice, portrait, result.

This workflow is for users who want a simple path from script to speech-synced video through the same lipsync product path already used on the site, not a graph-building session.

The phase-one version is intentionally compact: script in, voice generation, portrait animation, optional body movement when needed, and output review.

What this workflow emphasizes

Script-first

Useful when your content starts as copy and needs to become a face-led video quickly.

Fewer main steps

A creator can understand the whole flow in seconds: text, voice, portrait, output.

Built for iteration

Swap script or audio and rerun the relevant stage without rebuilding the workflow by hand.

Easy handoff to teams

The workflow is easier to reuse internally because it reads like a production process, not a graph diagram.

How it works

01

Write or paste the script

Start from the exact copy the video needs to say.

02

Generate or select the voice

Turn that script into speech inside the same flow, or bring your own audio if it is already recorded.

03

Animate the portrait

Feed the generated or uploaded audio into the lipsync step with your chosen face.

04

Review the output

Approve, export, or adjust the script, voice, or motion settings and rerun.

Real outputs

lipsync

News-style anchor

Good fit use cases

Use case01

Sales outreach intro

Short script-led talking head videos for outreach or landing page embeds.

Use case02

Product onboarding message

Reusable support or onboarding clips based on plain text copy.

Use case03

Localized presenter drafts

Reuse the same portrait with different scripts for different markets.

Use case04

Host-style café or studio intro

Take a character or portrait, pair it with a script, and optionally add body movement when the speaker should feel more physically present.

FAQ

Why is this a workflow page instead of a general use-case page?

Because the main search intent here is about the exact step order from text to lipsync output.

Do I need a separate TTS product first?

No. The point is to keep the text-to-voice-to-video path inside one broader product environment.

Can this later connect to broader workflows?

Yes. The phase-one workflow page is lightweight, but it is designed to map cleanly onto richer multi-step creation later.

Do I have to use body movement in this workflow?

No. The default talking-head path stays simpler and more stable. Body movement is an optional upgrade when you want more presence from the speaker.

Continue with