Text-to-Speech Lip Sync Guide
Text-to-Speech Lip Sync Guide
Text-to-Speech Lip Sync Guide
Combine text-to-speech with lip sync to create talking head videos from just text and a source video. Type a script, pick a voice, and Sync Labs generates a video where the speaker’s lips match the spoken words.
The fastest path. Sync Labs’ ElevenLabs integration handles TTS and lipsync in a single API call — no need to generate and host audio separately.
Enable the ElevenLabs integration from your Integrations settings. You can use Sync Labs’ built-in integration or provide your own ElevenLabs API key (Creator plan or higher).
If you use a TTS provider other than ElevenLabs — Google Cloud TTS, Amazon Polly, Azure Speech, or any other service — generate the audio first, host it at a public URL, then pass it to Sync Labs.
This approach works with any TTS provider. The only requirement is that the audio file is accessible via a public URL.
Clone a speaker’s voice with ElevenLabs, then use that cloned voice ID with Sync Labs’ integration. The result: a video where the speaker looks AND sounds like themselves — speaking entirely new words.
The ElevenLabs integration has a 5,000-character limit per generation. For longer scripts, split them into segments using the Segments API, with each segment referencing a separate TTS input.
Stability controls how consistent the voice sounds. Lower values (0.2-0.4) produce more expressive, varied speech. Higher values (0.6-0.8) produce more consistent, predictable speech. Similarity boost controls how closely the output matches the original voice. For cloned voices, use higher values (0.8-0.9).
For talking head videos where the face is prominent or the scene is challenging, sync-3 is the highest-quality option. It handles obstructions, extreme angles, low light, and 4K output better than earlier lipsync models. Use lipsync-2-pro when you want premium facial detail at a lower price point than sync-3.
For short clips (under 15 seconds) where you want the speaker to show emotion, use react-1 with an emotion prompt. The model generates facial expressions and head movements that match the audio tone.
For longer scripts or multi-speaker scenarios, use the Segments API with multiple TTS inputs:
TTS generation failures typically come from one of four issues: an empty or invalid voice ID, a script that exceeds the character limit, or a missing ElevenLabs API key configuration. First, verify that the voiceId you are passing is a valid, non-empty ElevenLabs voice ID — passing an empty string ("") returns a 422 error. Voice IDs can also expire if the voice is deleted from your ElevenLabs account or if you are referencing a shared voice that is no longer available. Second, check that your script field is under the 5,000-character limit; scripts that exceed this limit will be rejected. Third, confirm that the ElevenLabs integration is enabled in your Integrations settings. Free accounts use Sync Labs’ built-in ElevenLabs key, while Creator plan and above can provide their own API key for higher quotas. Check the error response from the GET /v2/generate/{id} endpoint for the specific error code and message.
To find your ElevenLabs voice ID, log in to the ElevenLabs dashboard and navigate to the Voices section. Select the voice you want to use, then look for the voice ID in the URL bar or in the voice settings panel — it is a string of characters like EXAVITQu4vr4xnSDxMaL. You can also find voice IDs through the ElevenLabs API by calling their List Voices endpoint. If you are using a cloned voice, the voice ID is returned when you create the clone. Copy the voice ID exactly as shown and pass it as the voiceId parameter in your Sync Labs API request. Note that voice IDs are case-sensitive. If you are using Sync Labs’ built-in ElevenLabs integration on a free account, you can use any of the default ElevenLabs voices without needing your own ElevenLabs account.
Yes, Sync Labs’ TTS integration supports multiple languages through ElevenLabs. ElevenLabs offers multilingual voice models that can generate speech in over 29 languages including Spanish, French, German, Portuguese, Japanese, Chinese, Arabic, Hindi, and many more. To use TTS in a non-English language, choose an ElevenLabs voice that supports your target language — multilingual voices are labeled as such in the ElevenLabs voice library. Write your script in the target language and the TTS engine will generate speech in that language. The lip sync model will then match the lip movements to the generated audio regardless of the language, as Sync Labs’ lip sync models are language-agnostic. For the best results, select a voice that is native to your target language rather than relying on a single voice to handle all languages.