Feature guide

Synclip Audio Studio — TTS, Voice Clone & Audio Separation in One Workspace

Workspace · Audio Studio

Three production-ready audio tools in a single panel: high-definition text-to-speech across 77 voices and multiple languages, one-shot voice cloning from a reference file, and AI-powered stem separation. All connected directly to your lip-sync workflow.

Add hero image to public/blog/audio-studio-hero.jpg

What is Synclip Audio Studio?

Synclip Audio Studio is the audio production hub inside your workspace. It consolidates three separate audio workflows — text-to-speech, voice cloning, and stem separation — into a single mode-switching panel, so you never have to leave your project to produce the audio track a video needs.

The three live modes are: Text to Speech (TTS), Voice Clone, and Audio Separation. Two more — Text to Music and Speech to Text (ASR) — are in development and will roll out as they reach production quality.

Every mode is connected to the same coin balance and task queue. Results land in My Creations automatically, and any audio file produced in the studio can be handed straight to the lipsync workspace with one click.

The five modes at a glance

Audio Studio is built around a mode-switching interface. You pick the workflow you need, and the input panel reconfigures for that task.

Text to SpeechLive

Convert a script into natural human-sounding speech. Choose from 77 voices across Chinese, English, Japanese, Korean, French, Spanish, and more.

  • 77 voices across 7+ languages — Chinese (Mandarin), English (US/UK/AU/IN), Japanese, Korean, French, Spanish, Italian, Portuguese
  • Character limit scales with your subscription tier: 1,000 (free) → 3,000 → 5,000 → 10,000 characters
  • Standard and premium voices — premium voices have richer, more expressive delivery
  • Speed control at generation time

Voice CloneLive

Upload a short reference audio file and generate new speech in that voice. No long training session required — one upload is enough.

  • Upload any WAV or MP3 up to 10 MB as the reference
  • Type the target script in the left panel, then generate speech that matches the reference voice
  • Works best with clean, single-speaker audio — at least 5–10 seconds of natural speech
  • Output lands in My Creations alongside your TTS files
  • Useful for branded narrators, multilingual dubbing, or keeping an existing voice consistent across new content

Audio SeparationLive

Upload a mixed audio or video file and split it into two stems: foreground (vocals) and background (music / ambient).

  • Upload any audio file up to 10 MB
  • Two output files: _fg (foreground / vocals) and _bg (background / backing track)
  • Priced at 4 coins per minute of audio
  • Use cases: extract clean vocals for dubbing, isolate background music for B-roll, remove a backing track before applying lip-sync

Text to MusicComing soon

Describe the music you need and generate a matching track. This mode is in development — it will appear as an active option once it reaches production quality.

  • Prompt-based music generation
  • Designed to produce background scores for video content

Speech to Text (ASR)Coming soon

Transcribe any audio file to text with high accuracy and multi-language support. Coming soon.

  • Strong multi-language support
  • Output as plain text or timed transcript

Text to Speech — 77 voices, 7+ languages

The TTS mode is the most-used part of Audio Studio, primarily because it feeds directly into lip-sync video production. Here is a sample of the voices available across the main language groups:

Chinese (Mandarin)

VoiceGenderStyleBest for
云健 (Yunjian)MaleSteadyAudiobook, narration
云扬 (Yunyáng)MaleEnergeticPodcast, social media
小妮 (Xiǎo Ní)FemaleSweetAnimation characters
小小 (Xiǎo Xiǎo)FemaleGentleVoice assistant
凌雨燕 (Líng Yǔyàn)FemaleElegantStorytelling
刘平 (Liú Píng)MaleAuthoritativePresentation, news

English (US / UK / AU / IN)

VoiceGenderStyleBest for
JessicaFemaleFriendlyPodcast
OnyxMaleDeepMovie trailer, promo
NovaFemaleModernVlog, social content
NicoleFemaleProfessionalTutorial, e-learning
FenrirMaleDramaticFantasy narration
RiverFemaleSoothingAudiobook, meditation

Japanese / Korean / French / Spanish / Italian / Portuguese

VoiceGenderStyleBest for
Sakura (JA)FemaleWarmTutorial, commercial
Nori (JA)MaleProfessionalCorporate, presentation
Chae-won (KO)FemaleClearPodcast, vlog
Sophie (FR)FemaleNaturalE-learning, documentary
Carlos (ES)MaleEnergeticAds, YouTube
Isabella (PT)FemaleFriendlySocial media, tutorials

Tips for better TTS results

  • Use punctuation to control pacing. A full stop produces a longer natural pause than a comma. If you need a distinct beat between two ideas, end the first sentence properly.
  • Break long paragraphs into short sentences — shorter sentences produce noticeably cleaner, more natural-sounding delivery.
  • Slow down the rate slightly (0.85×) on brand names, technical terms, or any phrase that needs the listener to register it.
  • Premium voices have richer tonal variation; use them for hero narration or final productions. Standard voices are great for drafts and functional content.
  • Match voice energy to the video context: an energetic, warm voice works over fast cuts and product demos; a measured, calm voice suits documentaries and e-learning.

Voice Clone — match any voice from a reference file

Voice Clone lets you generate speech that sounds like a specific person — without any long setup. You upload a short reference recording, type your script, and Audio Studio produces that voice reading your new text.

The most common use case is brand consistency: if a client has existing narration or a brand voice they want to carry into new content, Voice Clone handles that without a new studio recording session.

It also works for multilingual dubbing: clone a speaker's English voice and generate the Spanish version of the same script, keeping the same voice character across languages.

How to use Voice Clone

  1. Switch to the Voice Clone tab in Audio Studio.
  2. In the right panel, click the upload zone and select a WAV or MP3 reference file (up to 10 MB).
  3. In the left panel, type the script you want generated in that voice.
  4. Click Generate — the result is saved to My Creations.

For best results: use a clean reference with minimal background noise, a single speaker, and at least 5–10 seconds of natural speech. Recordings with music, reverb, or multiple speakers will reduce accuracy.

Audio Separation — split any track into vocals and backing

Audio Separation takes a mixed audio file and returns two stems: a foreground file containing the vocals or primary speaker, and a background file containing the music, ambience, or backing track.

The clearest use case for video production: you have a clip with a speaker and background music, but you need clean vocals to feed into lip-sync or dubbing. Upload the mixed file, run separation, and you get the isolated voice track in seconds.

The reverse works too. If you have a great piece of background music buried inside a clip, separation pulls it out as a standalone file ready to drop onto a new timeline.

Output files

  • _fg — foreground stem (vocals, primary speaker, lead instrument)
  • _bg — background stem (music, ambience, and any other sound behind the speaker)

Audio Separation is priced at 4 coins per minute of uploaded audio. A 3-minute track costs 12 coins.

How Audio Studio connects to your lip-sync workflow

Audio Studio was designed first as a feeder for lip-sync video production. The connection between the two workspaces is direct:

  1. Produce your voice track in Audio Studio (TTS, Voice Clone, or a cleaned separation output).
  2. The result lands in My Creations.
  3. Open the Lipsync workspace, select "From My Creations" as the audio source, and pick the file.
  4. Upload your portrait (or use an existing one), configure body movement if needed, and render.

This loop — script → audio → lipsync video — can run entirely inside Synclip without downloading or re-uploading files between tools.

Start in Audio Studio

  1. Open your Synclip workspace.
  2. Select Audio Studio from the left sidebar.
  3. Pick your mode: TTS, Voice Clone, or Audio Separation.
  4. Generate your track and send it to lipsync — or download it directly.

If you already have a Synclip account, Audio Studio is available now. The three live modes — TTS, Voice Clone, and Separation — are ready to use. Text to Music and ASR will appear in the mode switcher once they reach production readiness.