Video Translation API Guide

Video translation requires a multi-step pipeline: transcribe the original audio, translate the text, generate speech in the target language, and lip sync the new audio to the original video. Sync Labs handles the final lipsync step. This guide walks through the full pipeline.

Full Pipeline Walkthrough

Transcribe the original audio

Extract the spoken words from your source video. OpenAI’s Whisper is a solid choice for transcription.

transcribe.py

1 from openai import OpenAI
2 
3 client = OpenAI()
4 
5 # Extract audio from video first (using ffmpeg or similar)
6 audio_file = open("original-audio.wav", "rb")
7 
8 transcript = client.audio.transcriptions.create(
9     model="whisper-1",
10     file=audio_file,
11     response_format="verbose_json",
12     timestamp_granularities=["segment"],
13 )
14 
15 print(transcript.text)
16 # Save segments with timestamps for alignment
17 for segment in transcript.segments:
18     print(f"[{segment['start']:.1f}s - {segment['end']:.1f}s] {segment['text']}")

The segment timestamps are useful for aligning translated audio with the correct video sections, especially for multi-speaker or long-form content.

Translate the transcript

Translate the transcribed text into the target language. Use a translation API or LLM for this step.

translate.py

1 from openai import OpenAI
2 
3 client = OpenAI()
4 
5 original_text = "Welcome to our platform. Today we'll walk through the new features."
6 target_language = "Spanish"
7 
8 response = client.chat.completions.create(
9     model="gpt-4o",
10     messages=[
11         {
12             "role": "system",
13             "content": f"Translate the following text to {target_language}. "
14                        f"Keep the tone natural and conversational. "
15                        f"Return only the translated text.",
16         },
17         {"role": "user", "content": original_text},
18     ],
19 )
20 
21 translated_text = response.choices[0].message.content
22 print(translated_text)
23 # "Bienvenidos a nuestra plataforma. Hoy repasaremos las nuevas funciones."

For production pipelines, consider specialized translation APIs (DeepL, Google Translate) for higher throughput and language coverage.

Generate speech in the target language

Convert the translated text to audio using a TTS service. ElevenLabs supports multilingual voice cloning — you can clone the original speaker’s voice and generate speech in the new language.

generate_speech.py

1 import requests
2 
3 ELEVENLABS_API_KEY = "your-elevenlabs-key"
4 VOICE_ID = "EXAVITQu4vr4xnSDxMaL"  # Or a cloned voice ID
5 
6 response = requests.post(
7     f"https://api.elevenlabs.io/v1/text-to-speech/{VOICE_ID}",
8     headers={
9         "xi-api-key": ELEVENLABS_API_KEY,
10         "Content-Type": "application/json",
11     },
12     json={
13         "text": "Bienvenidos a nuestra plataforma. Hoy repasaremos las nuevas funciones.",
14         "model_id": "eleven_multilingual_v2",
15         "voice_settings": {
16             "stability": 0.5,
17             "similarity_boost": 0.75,
18         },
19     },
20 )
21 
22 with open("translated-audio.mp3", "wb") as f:
23     f.write(response.content)

Upload the generated audio to a publicly accessible URL for the next step.

Lip sync with Sync Labs API

Send the original video and the translated audio to Sync Labs. The API generates new lip movements matching the translated speech.

1 import { SyncClient } from "@sync.so/sdk";
2 
3 const sync = new SyncClient();
4 
5 const response = await sync.generations.create({
6     input: [
7         { type: "video", url: "https://your-cdn.com/original-video.mp4" },
8         { type: "audio", url: "https://your-cdn.com/translated-audio.mp3" },
9     ],
10     model: "lipsync-2",
11     options: { sync_mode: "cut_off" },
12 });
13 
14 const jobId = response.id;
15 console.log(`Lipsync job submitted: ${jobId}`);
16 
17 // Poll for completion
18 let generation = await sync.generations.get(jobId);
19 while (!["COMPLETED", "FAILED", "REJECTED"].includes(generation.status)) {
20     await new Promise((r) => setTimeout(r, 10000));
21     generation = await sync.generations.get(jobId);
22 }
23 
24 if (generation.status === "COMPLETED") {
25     console.log(`Translated video ready: ${generation.outputUrl}`);
26 } else {
27     console.log(`Generation failed: ${jobId}`);
28 }

Use webhooks for production

For production pipelines, replace polling with webhooks. Pass a webhookUrl when creating the generation and Sync Labs sends a POST request when the job finishes.

1 response = sync.generations.create(
2     input=[
3         Video(url="https://your-cdn.com/original-video.mp4"),
4         Audio(url="https://your-cdn.com/translated-audio.mp3"),
5     ],
6     model="lipsync-2",
7     webhook_url="https://your-app.com/webhooks/sync",
8 )

Shortcut: Built-in ElevenLabs Integration

You can skip the separate TTS step by using Sync Labs’ built-in ElevenLabs integration. Pass the translated text directly and Sync Labs handles TTS and lipsync in one call.

1 from sync import Sync
2 from sync.common import Video, TTS, GenerationOptions
3 
4 sync = Sync()
5 
6 response = sync.generations.create(
7     input=[
8         Video(url="https://your-cdn.com/original-video.mp4"),
9         TTS(
10             provider={
11                 "name": "elevenlabs",
12                 "voiceId": "EXAVITQu4vr4xnSDxMaL",
13                 "script": "Bienvenidos a nuestra plataforma. Hoy repasaremos las nuevas funciones.",
14                 "stability": 0.5,
15                 "similarityBoost": 0.75,
16             }
17         ),
18     ],
19     model="lipsync-2",
20     options=GenerationOptions(sync_mode="cut_off"),
21 )

See the Integrations page for setup instructions and voice configuration.

Using the sync-examples Repository

For a complete, ready-to-run translation pipeline, check the sync-examples repository. The translation example includes transcription with Whisper, translation with GPT, TTS with ElevenLabs, and lipsync with Sync Labs — all wired together.

$ git clone https://github.com/synchronicity-labs/sync-examples.git
$ cd sync-examples/translation/python
$ pip install -r requirements.txt
$ # Configure your API keys in args.py
$ python main.py

Quality Optimization Tips

Choose the right model

Use lipsync-2 for standard translation jobs. Use lipsync-2-pro for premium content where facial detail (beards, teeth, wrinkles) matters. The quality difference is most visible in close-up shots.

Ensure audio quality

Clean, high-quality TTS audio produces better lipsync results. Use high-fidelity TTS models (like eleven_multilingual_v2) and avoid noisy or compressed audio files.

Match speaking pace

Translated text often has a different word count than the original. Tune your TTS speed settings so the translated audio duration roughly matches the original video length. This reduces artifacts from sync_mode adjustments.

Handling Long Videos

For videos longer than a few minutes, break them into segments:

Transcribe with timestamps — Use Whisper’s segment output to identify natural break points.
Translate segment by segment — Translate each chunk individually for better accuracy.
Generate audio per segment — Create separate TTS audio files for each segment.
Use the Segments API — Submit all segments in a single Sync Labs API call with different audio inputs per time range. See the Segments Guide.

For batch translation of multiple videos, use the Batch API to submit up to 500 generation requests in one operation.

Next Steps

Video Dubbing API Guide — Focused guide for the dubbing step
Text-to-Speech Lip Sync Guide — Combine TTS providers with lipsync
Segments Guide — Multi-speaker and long-form video handling
Batch API — Process multiple videos at scale