For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
SupportStatusTry now
DocumentationAPI Reference
DocumentationAPI Reference
    • Studio
    • Discord
    • Blog
    • Changelog
  • Getting Started
    • Introduction
    • Quickstart
    • Free Trial
  • Product
    • How AI Lip Sync Works
    • Use Cases
    • Billing
    • Integrations
    • Experimental features
    • Generation Times & Performance
    • Troubleshooting
  • Compatibility and Tips
    • Web Browser Support
    • Media Formats Support
    • Media Content Tips
    • Improving Lip Sync Quality
  • WebApp Guides
    • Speaker Selection
    • Dubbing
  • Developer Guides
    • SDKs
    • Python SDK Guide
    • TypeScript SDK Guide
    • Segments
    • Error Handling
    • Speaker Selection
    • Example Projects
  • Tutorials
    • Dubbing
    • Video Dubbing API Guide
    • Video Translation API Guide
    • Text-to-Speech Lip Sync
    • Personalized Video Messaging
    • Translation/Dubbing
  • Plugins & Extensions
    • MCP Server
    • ComfyUI
LogoLogo
SupportStatusTry now
On this page
  • Full Pipeline Walkthrough
  • Shortcut: Built-in ElevenLabs Integration
  • Using the sync-examples Repository
  • Quality Optimization Tips
  • Handling Long Videos
  • Next Steps
Tutorials

Video Translation API Guide

Was this page helpful?
Edit this page

Last updated May 15, 2026

Previous

Text-to-Speech Lip Sync Guide

Next
Built with

Video translation requires a multi-step pipeline: transcribe the original audio, translate the text, generate speech in the target language, and lip sync the new audio to the original video. Sync Labs handles the final lipsync step. This guide walks through the full pipeline.

Full Pipeline Walkthrough

1

Transcribe the original audio

Extract the spoken words from your source video. OpenAI’s Whisper is a solid choice for transcription.

transcribe.py
1from openai import OpenAI
2
3client = OpenAI()
4
5# Extract audio from video first (using ffmpeg or similar)
6audio_file = open("original-audio.wav", "rb")
7
8transcript = client.audio.transcriptions.create(
9 model="whisper-1",
10 file=audio_file,
11 response_format="verbose_json",
12 timestamp_granularities=["segment"],
13)
14
15print(transcript.text)
16# Save segments with timestamps for alignment
17for segment in transcript.segments:
18 print(f"[{segment['start']:.1f}s - {segment['end']:.1f}s] {segment['text']}")

The segment timestamps are useful for aligning translated audio with the correct video sections, especially for multi-speaker or long-form content.

2

Translate the transcript

Translate the transcribed text into the target language. Use a translation API or LLM for this step.

translate.py
1from openai import OpenAI
2
3client = OpenAI()
4
5original_text = "Welcome to our platform. Today we'll walk through the new features."
6target_language = "Spanish"
7
8response = client.chat.completions.create(
9 model="gpt-4o",
10 messages=[
11 {
12 "role": "system",
13 "content": f"Translate the following text to {target_language}. "
14 f"Keep the tone natural and conversational. "
15 f"Return only the translated text.",
16 },
17 {"role": "user", "content": original_text},
18 ],
19)
20
21translated_text = response.choices[0].message.content
22print(translated_text)
23# "Bienvenidos a nuestra plataforma. Hoy repasaremos las nuevas funciones."

For production pipelines, consider specialized translation APIs (DeepL, Google Translate) for higher throughput and language coverage.

3

Generate speech in the target language

Convert the translated text to audio using a TTS service. ElevenLabs supports multilingual voice cloning — you can clone the original speaker’s voice and generate speech in the new language.

generate_speech.py
1import requests
2
3ELEVENLABS_API_KEY = "your-elevenlabs-key"
4VOICE_ID = "EXAVITQu4vr4xnSDxMaL" # Or a cloned voice ID
5
6response = requests.post(
7 f"https://api.elevenlabs.io/v1/text-to-speech/{VOICE_ID}",
8 headers={
9 "xi-api-key": ELEVENLABS_API_KEY,
10 "Content-Type": "application/json",
11 },
12 json={
13 "text": "Bienvenidos a nuestra plataforma. Hoy repasaremos las nuevas funciones.",
14 "model_id": "eleven_multilingual_v2",
15 "voice_settings": {
16 "stability": 0.5,
17 "similarity_boost": 0.75,
18 },
19 },
20)
21
22with open("translated-audio.mp3", "wb") as f:
23 f.write(response.content)

Upload the generated audio to a publicly accessible URL for the next step.

4

Lip sync with Sync Labs API

Send the original video and the translated audio to Sync Labs. The API generates new lip movements matching the translated speech.

1import { SyncClient } from "@sync.so/sdk";
2
3const sync = new SyncClient();
4
5const response = await sync.generations.create({
6 input: [
7 { type: "video", url: "https://your-cdn.com/original-video.mp4" },
8 { type: "audio", url: "https://your-cdn.com/translated-audio.mp3" },
9 ],
10 model: "lipsync-2",
11 options: { sync_mode: "cut_off" },
12});
13
14const jobId = response.id;
15console.log(`Lipsync job submitted: ${jobId}`);
16
17// Poll for completion
18let generation = await sync.generations.get(jobId);
19while (!["COMPLETED", "FAILED", "REJECTED"].includes(generation.status)) {
20 await new Promise((r) => setTimeout(r, 10000));
21 generation = await sync.generations.get(jobId);
22}
23
24if (generation.status === "COMPLETED") {
25 console.log(`Translated video ready: ${generation.outputUrl}`);
26} else {
27 console.log(`Generation failed: ${jobId}`);
28}
5

Use webhooks for production

For production pipelines, replace polling with webhooks. Pass a webhookUrl when creating the generation and Sync Labs sends a POST request when the job finishes.

1response = sync.generations.create(
2 input=[
3 Video(url="https://your-cdn.com/original-video.mp4"),
4 Audio(url="https://your-cdn.com/translated-audio.mp3"),
5 ],
6 model="lipsync-2",
7 webhook_url="https://your-app.com/webhooks/sync",
8)

Shortcut: Built-in ElevenLabs Integration

You can skip the separate TTS step by using Sync Labs’ built-in ElevenLabs integration. Pass the translated text directly and Sync Labs handles TTS and lipsync in one call.

1from sync import Sync
2from sync.common import Video, TTS, GenerationOptions
3
4sync = Sync()
5
6response = sync.generations.create(
7 input=[
8 Video(url="https://your-cdn.com/original-video.mp4"),
9 TTS(
10 provider={
11 "name": "elevenlabs",
12 "voiceId": "EXAVITQu4vr4xnSDxMaL",
13 "script": "Bienvenidos a nuestra plataforma. Hoy repasaremos las nuevas funciones.",
14 "stability": 0.5,
15 "similarityBoost": 0.75,
16 }
17 ),
18 ],
19 model="lipsync-2",
20 options=GenerationOptions(sync_mode="cut_off"),
21)

See the Integrations page for setup instructions and voice configuration.

Using the sync-examples Repository

For a complete, ready-to-run translation pipeline, check the sync-examples repository. The translation example includes transcription with Whisper, translation with GPT, TTS with ElevenLabs, and lipsync with Sync Labs — all wired together.

$git clone https://github.com/synchronicity-labs/sync-examples.git
$cd sync-examples/translation/python
$pip install -r requirements.txt
$# Configure your API keys in args.py
$python main.py

Quality Optimization Tips

Choose the right model

Use lipsync-2 for standard translation jobs. Use lipsync-2-pro for premium content where facial detail (beards, teeth, wrinkles) matters. The quality difference is most visible in close-up shots.

Ensure audio quality

Clean, high-quality TTS audio produces better lipsync results. Use high-fidelity TTS models (like eleven_multilingual_v2) and avoid noisy or compressed audio files.

Match speaking pace

Translated text often has a different word count than the original. Tune your TTS speed settings so the translated audio duration roughly matches the original video length. This reduces artifacts from sync_mode adjustments.

Handling Long Videos

For videos longer than a few minutes, break them into segments:

  1. Transcribe with timestamps — Use Whisper’s segment output to identify natural break points.
  2. Translate segment by segment — Translate each chunk individually for better accuracy.
  3. Generate audio per segment — Create separate TTS audio files for each segment.
  4. Use the Segments API — Submit all segments in a single Sync Labs API call with different audio inputs per time range. See the Segments Guide.

For batch translation of multiple videos, use the Batch API to submit up to 500 generation requests in one operation.

Next Steps

  • Video Dubbing API Guide — Focused guide for the dubbing step
  • Text-to-Speech Lip Sync Guide — Combine TTS providers with lipsync
  • Segments Guide — Multi-speaker and long-form video handling
  • Batch API — Process multiple videos at scale