Text-to-Speech Lip Sync Guide
Combine text-to-speech with lip sync to create talking head videos from just text and a source video. Type a script, pick a voice, and Sync generates a video where the speaker’s lips match the spoken words.
Using the Built-in ElevenLabs Integration
The fastest path. Sync’s ElevenLabs integration handles TTS and lipsync in a single API call — no need to generate and host audio separately.
ElevenLabs Provider Parameters
Enable the ElevenLabs integration from your Integrations settings. You can use Sync’s built-in integration or provide your own ElevenLabs API key (Creator plan or higher).
Using External TTS Providers
If you use a TTS provider other than ElevenLabs — Google Cloud TTS, Amazon Polly, Azure Speech, or any other service — generate the audio first, host it at a public URL, then pass it to Sync.
This approach works with any TTS provider. The only requirement is that the audio file is accessible via a public URL.
Voice Cloning Workflow
Clone a speaker’s voice with ElevenLabs, then use that cloned voice ID with Sync’s integration. The result: a video where the speaker looks AND sounds like themselves — speaking entirely new words.
Best Practices
The ElevenLabs integration has a 5,000-character limit per generation. For longer scripts, split them into segments using the Segments API, with each segment referencing a separate TTS input.
Stability controls how consistent the voice sounds. Lower values (0.2-0.4) produce more expressive, varied speech. Higher values (0.6-0.8) produce more consistent, predictable speech. Similarity boost controls how closely the output matches the original voice. For cloned voices, use higher values (0.8-0.9).
For talking head videos where the face is prominent, lipsync-2-pro produces the best results. It handles detail around teeth, beards, and facial features better than other models. The trade-off is slower processing and higher cost.
For short clips (under 15 seconds) where you want the speaker to show emotion, use react-1 with an emotion prompt. The model generates facial expressions and head movements that match the audio tone.
Multi-Segment TTS
For longer scripts or multi-speaker scenarios, use the Segments API with multiple TTS inputs:
Troubleshooting TTS
Why did my TTS generation fail?
TTS generation failures typically come from one of three issues: an invalid voice ID, a script that exceeds the character limit, or a missing ElevenLabs API key configuration. First, verify that the voiceId you are passing is a valid ElevenLabs voice ID — voice IDs can expire if the voice is deleted from your ElevenLabs account or if you are referencing a shared voice that is no longer available. Second, check that your script field is under the 5,000-character limit; scripts that exceed this limit will be rejected. Third, confirm that the ElevenLabs integration is enabled in your Integrations settings. Free accounts use Sync’s built-in ElevenLabs key, while Creator plan and above can provide their own API key for higher quotas. Check the error response from the GET /v2/generate/{id} endpoint for the specific error code and message.
How do I find my ElevenLabs voice ID?
To find your ElevenLabs voice ID, log in to the ElevenLabs dashboard and navigate to the Voices section. Select the voice you want to use, then look for the voice ID in the URL bar or in the voice settings panel — it is a string of characters like EXAVITQu4vr4xnSDxMaL. You can also find voice IDs through the ElevenLabs API by calling their List Voices endpoint. If you are using a cloned voice, the voice ID is returned when you create the clone. Copy the voice ID exactly as shown and pass it as the voiceId parameter in your Sync API request. Note that voice IDs are case-sensitive. If you are using Sync’s built-in ElevenLabs integration on a free account, you can use any of the default ElevenLabs voices without needing your own ElevenLabs account.
Can I use TTS in languages other than English?
Yes, Sync’s TTS integration supports multiple languages through ElevenLabs. ElevenLabs offers multilingual voice models that can generate speech in over 29 languages including Spanish, French, German, Portuguese, Japanese, Chinese, Arabic, Hindi, and many more. To use TTS in a non-English language, choose an ElevenLabs voice that supports your target language — multilingual voices are labeled as such in the ElevenLabs voice library. Write your script in the target language and the TTS engine will generate speech in that language. The lip sync model will then match the lip movements to the generated audio regardless of the language, as Sync’s lip sync models are language-agnostic. For the best results, select a voice that is native to your target language rather than relying on a single voice to handle all languages.
Next Steps
- Integrations — Full ElevenLabs setup and troubleshooting
- React Models — Add emotion and expressions to talking head videos
- Video Dubbing API Guide — Build dubbing pipelines
- Segments Guide — Handle multi-speaker and long-form content
Support Knowledge Base
- How to use TTS with Sync — Step-by-step setup guide
- Internal Server Error during TTS — TTS error troubleshooting
- ElevenLabs Terms of Service error — Voice policy issues

