Video dubbing combines translated audio with lip-synced video so dubbed content looks natural in the target language. You can either use Sync Labs’ built-in dubbing flow with dubParams, or provide translated audio yourself and use Sync Labs for the lipsync step.
Install the SDK for your language:
Set your API key:
dubParamsUse dubParams when you want Sync Labs to extract the source audio from the video, translate and dub it through ElevenLabs, then run lipsync on the dubbed result. This is the simplest path when your input video already has source audio.
When dubParams is present:
video input with audiodubParams.targetLang to the target language code, such as "es", "fr", or "hi"dubParams.sourceLang; omit it or use "auto" for automatic source-language detectiondubParams.numSpeakers; 0 enables automatic speaker detectionBuilt-in dubbing is backed by ElevenLabs. If the video has no usable source audio, provide your own translated audio instead and follow the manual pipeline below.
Generate translated audio using a text-to-speech service like ElevenLabs, Google Cloud TTS, or Amazon Polly. You can also use a human voice actor. The audio must be hosted at a publicly accessible URL.
If you already have a translated audio file, upload it to your hosting service and grab the URL.
Send the source video and translated audio to the Sync Labs API. The API generates new lip movements matching the translated audio.
Check the generation status until it completes. For production systems, use webhooks instead of polling.
Sync Labs has a built-in ElevenLabs integration that handles text-to-speech and lipsync in a single API call. Instead of generating audio separately, you pass the translated text directly.
The script field has a maximum of 5,000 characters per generation. For longer scripts, split them into segments. See the Integrations page for ElevenLabs setup details.
Sync Labs’ lipsync models are language-agnostic. They work with audio in any language — the models analyze mouth shapes from the audio waveform, not the language itself. If your translated audio is clear and well-produced, the lipsync output will match.
For a complete translation pipeline walkthrough (transcription, translation, TTS, and lipsync), see the Video Translation API Guide.
For videos with multiple speakers, use the segments API to assign different audio tracks to different time ranges. Each segment can reference a separate audio input with a distinct voice.
See the Segments Guide for full documentation and more examples.
Replace polling with webhooks for production pipelines. You receive a POST notification when the job completes, eliminating wasted API calls.
Dubbing an entire video library? The Batch API lets you submit up to 500 generations in a single operation with a 24-hour turnaround.
Use lipsync-2 for most dubbing jobs. Use sync-3 for production-quality dubbing, complex scenes, obstructions, profile angles, or 4K output. Switch to lipsync-2-pro when you need premium facial detail at a lower price point than sync-3.
Set sync_mode to control what happens when audio and video lengths differ. cut_off trims excess audio. bounce loops the video to match audio length. See sync mode options for details.