For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
SupportStatusTry now
DocumentationAPI Reference
DocumentationAPI Reference
    • Studio
    • Discord
    • Blog
    • Changelog
  • Getting Started
    • Introduction
    • Quickstart
    • Free Trial
  • Product
    • How AI Lip Sync Works
    • Use Cases
    • Billing
    • Integrations
    • Experimental features
    • Generation Times & Performance
    • Troubleshooting
  • Compatibility and Tips
    • Web Browser Support
    • Media Formats Support
    • Media Content Tips
    • Improving Lip Sync Quality
  • WebApp Guides
    • Speaker Selection
    • Dubbing
  • Developer Guides
    • SDKs
    • Python SDK Guide
    • TypeScript SDK Guide
    • Segments
    • Error Handling
    • Speaker Selection
    • Example Projects
  • Tutorials
    • Dubbing
    • Video Dubbing API Guide
    • Video Translation API Guide
    • Text-to-Speech Lip Sync
    • Personalized Video Messaging
    • Translation/Dubbing
  • Plugins & Extensions
    • MCP Server
    • ComfyUI
LogoLogo
SupportStatusTry now
On this page
  • Using the Built-in ElevenLabs Integration
  • ElevenLabs Provider Parameters
  • Using External TTS Providers
  • Voice Cloning Workflow
  • Best Practices
  • Multi-Segment TTS
  • Troubleshooting TTS
  • Next Steps
  • Support Knowledge Base
Tutorials

Text-to-Speech Lip Sync Guide

Was this page helpful?
Edit this page

Last updated June 1, 2026

Previous

Personalized Video Messaging

Next
Built with

Combine text-to-speech with lip sync to create talking head videos from just text and a source video. Type a script, pick a voice, and Sync Labs generates a video where the speaker’s lips match the spoken words.

Using the Built-in ElevenLabs Integration

The fastest path. Sync Labs’ ElevenLabs integration handles TTS and lipsync in a single API call — no need to generate and host audio separately.

1import { SyncClient } from "@sync.so/sdk";
2
3const sync = new SyncClient();
4
5async function main() {
6 const response = await sync.generations.create({
7 input: [
8 {
9 type: "video",
10 url: "https://assets.sync.so/docs/example-video.mp4",
11 },
12 {
13 type: "text",
14 provider: {
15 name: "elevenlabs",
16 voiceId: "EXAVITQu4vr4xnSDxMaL",
17 script: "Hey there. I wanted to walk you through our latest features. We shipped three major updates this week.",
18 stability: 0.5,
19 similarityBoost: 0.75,
20 },
21 },
22 ],
23 model: "lipsync-2",
24 options: { sync_mode: "cut_off" },
25 });
26
27 const jobId = response.id;
28 console.log(`Job submitted: ${jobId}`);
29
30 // Poll for completion
31 let generation = await sync.generations.get(jobId);
32 while (!["COMPLETED", "FAILED", "REJECTED"].includes(generation.status)) {
33 await new Promise((r) => setTimeout(r, 10000));
34 generation = await sync.generations.get(jobId);
35 }
36
37 if (generation.status === "COMPLETED") {
38 console.log(`Video ready: ${generation.outputUrl}`);
39 } else {
40 console.log(`Generation failed: ${jobId}`);
41 }
42}
43
44main();

ElevenLabs Provider Parameters

ParameterTypeDefaultDescription
namestring—Must be "elevenlabs"
voiceIdstring—ElevenLabs voice ID. Must be a non-empty string.
scriptstring—Text to speak (max 5,000 characters)
stabilityfloat0.5Voice stability (0.0-1.0). Lower = more expressive.
similarityBoostfloat0.75Voice similarity to original (0.0-1.0). Higher = closer match.

Enable the ElevenLabs integration from your Integrations settings. You can use Sync Labs’ built-in integration or provide your own ElevenLabs API key (Creator plan or higher).

Using External TTS Providers

If you use a TTS provider other than ElevenLabs — Google Cloud TTS, Amazon Polly, Azure Speech, or any other service — generate the audio first, host it at a public URL, then pass it to Sync Labs.

1import { SyncClient } from "@sync.so/sdk";
2
3const sync = new SyncClient();
4
5// Audio generated by your TTS provider, hosted at a public URL
6const ttsAudioUrl = "https://your-cdn.com/generated-speech.mp3";
7
8const response = await sync.generations.create({
9 input: [
10 { type: "video", url: "https://assets.sync.so/docs/example-video.mp4" },
11 { type: "audio", url: ttsAudioUrl },
12 ],
13 model: "lipsync-2",
14 options: { sync_mode: "cut_off" },
15});
16
17const jobId = response.id;
18console.log(`Job submitted: ${jobId}`);
19
20let generation = await sync.generations.get(jobId);
21while (!["COMPLETED", "FAILED", "REJECTED"].includes(generation.status)) {
22 await new Promise((r) => setTimeout(r, 10000));
23 generation = await sync.generations.get(jobId);
24}
25
26if (generation.status === "COMPLETED") {
27 console.log(`Video ready: ${generation.outputUrl}`);
28}

This approach works with any TTS provider. The only requirement is that the audio file is accessible via a public URL.

Voice Cloning Workflow

Clone a speaker’s voice with ElevenLabs, then use that cloned voice ID with Sync Labs’ integration. The result: a video where the speaker looks AND sounds like themselves — speaking entirely new words.

1

Clone the voice

Upload a clean audio sample of the speaker to ElevenLabs to create a cloned voice. You get back a voice ID.

1# Use ElevenLabs API or dashboard to clone a voice
2# https://elevenlabs.io/docs/voices/voice-cloning
3cloned_voice_id = "your-cloned-voice-id"
2

Generate lipsync with the cloned voice

Use the cloned voice ID in your Sync Labs API call.

1from sync import Sync
2from sync.common import Video, TTS, GenerationOptions
3
4sync = Sync()
5
6response = sync.generations.create(
7 input=[
8 Video(url="https://your-cdn.com/speaker-video.mp4"),
9 TTS(
10 provider={
11 "name": "elevenlabs",
12 "voiceId": cloned_voice_id,
13 "script": "This is the new script I want the speaker to say.",
14 "stability": 0.5,
15 "similarityBoost": 0.85, # Higher similarity for cloned voices
16 }
17 ),
18 ],
19 model="lipsync-2-pro", # Pro model for highest quality
20 options=GenerationOptions(sync_mode="cut_off"),
21)
3

Download the result

Poll for completion and retrieve the output video. The speaker now says the new script with their own voice and matching lip movements.

Best Practices

Keep scripts under 5,000 characters

The ElevenLabs integration has a 5,000-character limit per generation. For longer scripts, split them into segments using the Segments API, with each segment referencing a separate TTS input.

Tune voice settings

Stability controls how consistent the voice sounds. Lower values (0.2-0.4) produce more expressive, varied speech. Higher values (0.6-0.8) produce more consistent, predictable speech. Similarity boost controls how closely the output matches the original voice. For cloned voices, use higher values (0.8-0.9).

Use sync-3 for the highest quality

For talking head videos where the face is prominent or the scene is challenging, sync-3 is the highest-quality option. It handles obstructions, extreme angles, low light, and 4K output better than earlier lipsync models. Use lipsync-2-pro when you want premium facial detail at a lower price point than sync-3.

Use react-1 for expressive results

For short clips (under 15 seconds) where you want the speaker to show emotion, use react-1 with an emotion prompt. The model generates facial expressions and head movements that match the audio tone.

Multi-Segment TTS

For longer scripts or multi-speaker scenarios, use the Segments API with multiple TTS inputs:

1from sync import Sync
2from sync.common import Video, TTS
3
4sync = Sync()
5
6response = sync.generations.create(
7 input=[
8 Video(url="https://your-cdn.com/video.mp4"),
9 TTS(
10 provider={
11 "name": "elevenlabs",
12 "voiceId": "voice-id-1",
13 "script": "Welcome to the first section of our presentation.",
14 },
15 ref_id="intro",
16 ),
17 TTS(
18 provider={
19 "name": "elevenlabs",
20 "voiceId": "voice-id-2",
21 "script": "Now let me hand it over to my colleague for the demo.",
22 },
23 ref_id="handoff",
24 ),
25 ],
26 segments=[
27 {"startTime": 0, "endTime": 8, "audioInput": {"refId": "intro"}},
28 {"startTime": 8, "endTime": 15, "audioInput": {"refId": "handoff"}},
29 ],
30 model="lipsync-2",
31)

Troubleshooting TTS

Why did my TTS generation fail?

TTS generation failures typically come from one of four issues: an empty or invalid voice ID, a script that exceeds the character limit, or a missing ElevenLabs API key configuration. First, verify that the voiceId you are passing is a valid, non-empty ElevenLabs voice ID — passing an empty string ("") returns a 422 error. Voice IDs can also expire if the voice is deleted from your ElevenLabs account or if you are referencing a shared voice that is no longer available. Second, check that your script field is under the 5,000-character limit; scripts that exceed this limit will be rejected. Third, confirm that the ElevenLabs integration is enabled in your Integrations settings. Free accounts use Sync Labs’ built-in ElevenLabs key, while Creator plan and above can provide their own API key for higher quotas. Check the error response from the GET /v2/generate/{id} endpoint for the specific error code and message.

How do I find my ElevenLabs voice ID?

To find your ElevenLabs voice ID, log in to the ElevenLabs dashboard and navigate to the Voices section. Select the voice you want to use, then look for the voice ID in the URL bar or in the voice settings panel — it is a string of characters like EXAVITQu4vr4xnSDxMaL. You can also find voice IDs through the ElevenLabs API by calling their List Voices endpoint. If you are using a cloned voice, the voice ID is returned when you create the clone. Copy the voice ID exactly as shown and pass it as the voiceId parameter in your Sync Labs API request. Note that voice IDs are case-sensitive. If you are using Sync Labs’ built-in ElevenLabs integration on a free account, you can use any of the default ElevenLabs voices without needing your own ElevenLabs account.

Can I use TTS in languages other than English?

Yes, Sync Labs’ TTS integration supports multiple languages through ElevenLabs. ElevenLabs offers multilingual voice models that can generate speech in over 29 languages including Spanish, French, German, Portuguese, Japanese, Chinese, Arabic, Hindi, and many more. To use TTS in a non-English language, choose an ElevenLabs voice that supports your target language — multilingual voices are labeled as such in the ElevenLabs voice library. Write your script in the target language and the TTS engine will generate speech in that language. The lip sync model will then match the lip movements to the generated audio regardless of the language, as Sync Labs’ lip sync models are language-agnostic. For the best results, select a voice that is native to your target language rather than relying on a single voice to handle all languages.

Next Steps

  • Integrations — Full ElevenLabs setup and troubleshooting
  • React Models — Add emotion and expressions to talking head videos
  • Video Dubbing API Guide — Build dubbing pipelines
  • Segments Guide — Handle multi-speaker and long-form content

Support Knowledge Base

  • [How to use TTS with Sync Labs](https://support.sync.so/articles/1245054903-How-to-use-Text-to-Speech-(TTS)-with-Sync Labs) — Step-by-step setup guide
  • Internal Server Error during TTS — TTS error troubleshooting
  • ElevenLabs Terms of Service error — Voice policy issues