Text-to-Speech Lip Sync Guide

Combine text-to-speech with lip sync to create talking head videos from just text and a source video. Type a script, pick a voice, and Sync Labs generates a video where the speaker’s lips match the spoken words.

Using the Built-in ElevenLabs Integration

The fastest path. Sync Labs’ ElevenLabs integration handles TTS and lipsync in a single API call — no need to generate and host audio separately.

1 import { SyncClient } from "@sync.so/sdk";
2 
3 const sync = new SyncClient();
4 
5 async function main() {
6     const response = await sync.generations.create({
7         input: [
8             {
9                 type: "video",
10                 url: "https://assets.sync.so/docs/example-video.mp4",
11             },
12             {
13                 type: "text",
14                 provider: {
15                     name: "elevenlabs",
16                     voiceId: "EXAVITQu4vr4xnSDxMaL",
17                     script: "Hey there. I wanted to walk you through our latest features. We shipped three major updates this week.",
18                     stability: 0.5,
19                     similarityBoost: 0.75,
20                 },
21             },
22         ],
23         model: "lipsync-2",
24         options: { sync_mode: "cut_off" },
25     });
26 
27     const jobId = response.id;
28     console.log(`Job submitted: ${jobId}`);
29 
30     // Poll for completion
31     let generation = await sync.generations.get(jobId);
32     while (!["COMPLETED", "FAILED", "REJECTED"].includes(generation.status)) {
33         await new Promise((r) => setTimeout(r, 10000));
34         generation = await sync.generations.get(jobId);
35     }
36 
37     if (generation.status === "COMPLETED") {
38         console.log(`Video ready: ${generation.outputUrl}`);
39     } else {
40         console.log(`Generation failed: ${jobId}`);
41     }
42 }
43 
44 main();

ElevenLabs Provider Parameters

Parameter	Type	Default	Description
`name`	string	—	Must be `"elevenlabs"`
`voiceId`	string	—	ElevenLabs voice ID. Must be a non-empty string.
`script`	string	—	Text to speak (max 5,000 characters)
`stability`	float	0.5	Voice stability (0.0-1.0). Lower = more expressive.
`similarityBoost`	float	0.75	Voice similarity to original (0.0-1.0). Higher = closer match.

Enable the ElevenLabs integration from your Integrations settings. You can use Sync Labs’ built-in integration or provide your own ElevenLabs API key (Creator plan or higher).

Some ElevenLabs voices require verification before they can be used for TTS. If a generation fails with a message that the voice may violate ElevenLabs Terms of Service or requires verification, try a different verified voice or complete verification with ElevenLabs before retrying.

Using External TTS Providers

If you use a TTS provider other than ElevenLabs — Google Cloud TTS, Amazon Polly, Azure Speech, or any other service — generate the audio first, host it at a public URL, then pass it to Sync Labs.

1 import { SyncClient } from "@sync.so/sdk";
2 
3 const sync = new SyncClient();
4 
5 // Audio generated by your TTS provider, hosted at a public URL
6 const ttsAudioUrl = "https://your-cdn.com/generated-speech.mp3";
7 
8 const response = await sync.generations.create({
9     input: [
10         { type: "video", url: "https://assets.sync.so/docs/example-video.mp4" },
11         { type: "audio", url: ttsAudioUrl },
12     ],
13     model: "lipsync-2",
14     options: { sync_mode: "cut_off" },
15 });
16 
17 const jobId = response.id;
18 console.log(`Job submitted: ${jobId}`);
19 
20 let generation = await sync.generations.get(jobId);
21 while (!["COMPLETED", "FAILED", "REJECTED"].includes(generation.status)) {
22     await new Promise((r) => setTimeout(r, 10000));
23     generation = await sync.generations.get(jobId);
24 }
25 
26 if (generation.status === "COMPLETED") {
27     console.log(`Video ready: ${generation.outputUrl}`);
28 }

This approach works with any TTS provider. The only requirement is that the audio file is accessible via a public URL.

Voice Cloning Workflow

Clone a speaker’s voice with ElevenLabs, then use that cloned voice ID with Sync Labs’ integration. The result: a video where the speaker looks AND sounds like themselves — speaking entirely new words.

Clone the voice

Upload a clean audio sample of the speaker to ElevenLabs to create a cloned voice. You get back a voice ID.

1 # Use ElevenLabs API or dashboard to clone a voice
2 # https://elevenlabs.io/docs/voices/voice-cloning
3 cloned_voice_id = "your-cloned-voice-id"

Generate lipsync with the cloned voice

Use the cloned voice ID in your Sync Labs API call.

1 from sync import Sync
2 from sync.common import Video, TTS, GenerationOptions
3 
4 sync = Sync()
5 
6 response = sync.generations.create(
7     input=[
8         Video(url="https://your-cdn.com/speaker-video.mp4"),
9         TTS(
10             provider={
11                 "name": "elevenlabs",
12                 "voiceId": cloned_voice_id,
13                 "script": "This is the new script I want the speaker to say.",
14                 "stability": 0.5,
15                 "similarityBoost": 0.85,  # Higher similarity for cloned voices
16             }
17         ),
18     ],
19     model="lipsync-2-pro",  # Pro model for highest quality
20     options=GenerationOptions(sync_mode="cut_off"),
21 )

Download the result

Poll for completion and retrieve the output video. The speaker now says the new script with their own voice and matching lip movements.

Best Practices

Keep scripts under 5,000 characters

The ElevenLabs integration has a 5,000-character limit per generation. For longer scripts, split them into segments using the Segments API, with each segment referencing a separate TTS input.

Tune voice settings

Stability controls how consistent the voice sounds. Lower values (0.2-0.4) produce more expressive, varied speech. Higher values (0.6-0.8) produce more consistent, predictable speech. Similarity boost controls how closely the output matches the original voice. For cloned voices, use higher values (0.8-0.9).

Use sync-3 for the highest quality

For talking head videos where the face is prominent or the scene is challenging, sync-3 is the highest-quality option. It handles obstructions, extreme angles, low light, and 4K output better than earlier lipsync models. Use lipsync-2-pro when you want premium facial detail at a lower price point than sync-3.

Use react-1 for expressive results

For short clips (under 15 seconds) where you want the speaker to show emotion, use react-1 with an emotion prompt. The model generates facial expressions and head movements that match the audio tone.

Multi-Segment TTS

For longer scripts or multi-speaker scenarios, use the Segments API with multiple TTS inputs:

1 from sync import Sync
2 from sync.common import Video, TTS
3 
4 sync = Sync()
5 
6 response = sync.generations.create(
7     input=[
8         Video(url="https://your-cdn.com/video.mp4"),
9         TTS(
10             provider={
11                 "name": "elevenlabs",
12                 "voiceId": "voice-id-1",
13                 "script": "Welcome to the first section of our presentation.",
14             },
15             ref_id="intro",
16         ),
17         TTS(
18             provider={
19                 "name": "elevenlabs",
20                 "voiceId": "voice-id-2",
21                 "script": "Now let me hand it over to my colleague for the demo.",
22             },
23             ref_id="handoff",
24         ),
25     ],
26     segments=[
27         {"startTime": 0, "endTime": 8, "audioInput": {"refId": "intro"}},
28         {"startTime": 8, "endTime": 15, "audioInput": {"refId": "handoff"}},
29     ],
30     model="lipsync-2",
31 )

Troubleshooting TTS

Why did my TTS generation fail?

TTS generation failures typically come from one of four issues: an empty or invalid voice ID, a script that exceeds the character limit, or a missing ElevenLabs API key configuration. First, verify that the voiceId you are passing is a valid, non-empty ElevenLabs voice ID — passing an empty string ("") returns a 422 error. Voice IDs can also expire if the voice is deleted from your ElevenLabs account or if you are referencing a shared voice that is no longer available. Second, check that your script field is under the 5,000-character limit; scripts that exceed this limit will be rejected. Third, confirm that the ElevenLabs integration is enabled in your Integrations settings. Free accounts use Sync Labs’ built-in ElevenLabs key, while Creator plan and above can provide their own API key for higher quotas. Check the error response from the GET /v2/generate/{id} endpoint for the specific error code and message.

How do I find my ElevenLabs voice ID?

To find your ElevenLabs voice ID, log in to the ElevenLabs dashboard and navigate to the Voices section. Select the voice you want to use, then look for the voice ID in the URL bar or in the voice settings panel — it is a string of characters like EXAVITQu4vr4xnSDxMaL. You can also find voice IDs through the ElevenLabs API by calling their List Voices endpoint. If you are using a cloned voice, the voice ID is returned when you create the clone. Copy the voice ID exactly as shown and pass it as the voiceId parameter in your Sync Labs API request. Note that voice IDs are case-sensitive. If you are using Sync Labs’ built-in ElevenLabs integration on a free account, you can use any of the default ElevenLabs voices without needing your own ElevenLabs account.

Can I use TTS in languages other than English?

Yes, Sync Labs’ TTS integration supports multiple languages through ElevenLabs. ElevenLabs offers multilingual voice models that can generate speech in over 29 languages including Spanish, French, German, Portuguese, Japanese, Chinese, Arabic, Hindi, and many more. To use TTS in a non-English language, choose an ElevenLabs voice that supports your target language — multilingual voices are labeled as such in the ElevenLabs voice library. Write your script in the target language and the TTS engine will generate speech in that language. The lip sync model will then match the lip movements to the generated audio regardless of the language, as Sync Labs’ lip sync models are language-agnostic. For the best results, select a voice that is native to your target language rather than relying on a single voice to handle all languages.

Next Steps

Integrations — Full ElevenLabs setup and troubleshooting
React Models — Add emotion and expressions to talking head videos
Video Dubbing API Guide — Build dubbing pipelines
Segments Guide — Handle multi-speaker and long-form content

Support Knowledge Base

[How to use TTS with Sync Labs](https://support.sync.so/articles/1245054903-How-to-use-Text-to-Speech-(TTS)-with-Sync Labs) — Step-by-step setup guide
Internal Server Error during TTS — TTS error troubleshooting
ElevenLabs Terms of Service error — Voice policy issues