Voice Cloning | sync. labs

The Voices API lets you list the voices available to your account, clone a new voice from an audio or video sample, and delete clones you no longer need. A cloned voice returns a voiceId that you reuse anywhere a voice is accepted — in POST /v2/tts and in text inputs on POST /v2/generate.

The headline flow: clone a speaker’s voice from a talking-head video, synthesize a brand-new line in that voice, then lip sync the result onto a different video — all on a single API key. See The flagship flow below.

Listing voices

GET /v2/voices returns every voice available to you: sync. labs’ built-in voices plus any clones you have created. Use a voice’s id as the voiceId in text-to-speech and in generation text inputs.

$ curl https://api.sync.so/v2/voices \
>   -H "x-api-key: $SYNC_API_KEY"

The response is an array of voice objects:

1 [
2   {
3     "id": "EXAVITQu4vr4xnSDxMaL",
4     "name": "Rachel",
5     "provider": "elevenlabs",
6     "previewUrl": "https://assets.sync.so/voices/rachel-preview.mp3"
7   }
8 ]

string

The voice identifier. Pass this as voiceId in POST /v2/tts and in generation text inputs.

internalVoiceId

string

sync. labs’ internal identifier for the voice. Present on some voices; prefer id for API calls.

voiceId

string

Provider-side voice identifier. Present on some voices.

name

string

Human-readable voice name.

provider

stringRequired

The voice provider. Always "elevenlabs".

previewUrl

string

A URL to a short audio preview of the voice, when available.

Cloning a voice

POST /v2/voices clones a new voice from a sample and returns a voiceId you can use immediately. Provide a name plus either a sync. labs-hosted url or an assetId — not both.

The source sample must be hosted in sync. labs storage. Public third-party URLs are not accepted. Upload local files first with POST /v2/assets/upload and pass the returned assetId, or pass the url of an asset already in sync. labs storage.

Both audio and video sources are supported. For video sources, the audio track is extracted automatically and the first 2 minutes are used for cloning.

Clone from an uploaded asset

The recommended path: upload the sample with the Asset Uploads flow, then clone from the returned assetId.

$ curl -X POST https://api.sync.so/v2/voices \
>   -H "x-api-key: $SYNC_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "name": "Founder voice",
>     "assetId": "asset_abc123"
>   }'

Clone from a hosted URL

If your sample already lives in sync. labs storage, pass its url instead of an assetId.

$ curl -X POST https://api.sync.so/v2/voices \
>   -H "x-api-key: $SYNC_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "name": "Founder voice",
>     "url": "https://assets.sync.so/uploads/founder-sample.mp4"
>   }'

Request body

name

stringRequired

A label for the cloned voice.

url

string

URL of an audio or video sample hosted in sync. labs storage. Provide either url or assetId, not both.

assetId

string

ID of an asset previously uploaded via POST /v2/assets/upload. Provide either assetId or url, not both.

Response

A 201 response returns the new voice:

1 {
2   "voiceId": "cloned_9f8e7d6c",
3   "name": "Founder voice",
4   "internalVoiceId": "iv_1234"
5 }

voiceId

stringRequired

The cloned voice identifier. Use it as voiceId in POST /v2/tts and in generation text inputs.

name

stringRequired

The name you supplied for the clone.

internalVoiceId

string

sync. labs’ internal identifier for the voice, when available.

Clone slots are limited by your plan. When you hit the limit, POST /v2/voices returns a 403. Delete a voice you no longer need to free a slot, then retry the clone.

Deleting a voice

DELETE /v2/voices/{id} removes a clone and frees a clone slot. Use the voice’s id (the voiceId returned at clone time).

$ curl -X DELETE https://api.sync.so/v2/voices/cloned_9f8e7d6c \
>   -H "x-api-key: $SYNC_API_KEY"

A 200 response confirms the voice was deleted and the slot is available for a new clone.

The flagship flow

Clone a voice from a talking-head video, synthesize a new line in that voice with text-to-speech, then lip sync that audio onto a different video. The entire pipeline runs on one API key.

Upload the source video (if it's a local file)

Voice sources must be hosted in sync. labs storage. If your talking-head video lives locally, upload it first with the Asset Uploads flow and keep the returned assetId. If it already lives in sync. labs storage, skip ahead and use its url.

Clone the voice from the video

Call POST /v2/voices with the assetId (or url). The audio track is extracted from the video automatically — the first 2 minutes are used — and you get back a voiceId.

$ curl -X POST https://api.sync.so/v2/voices \
>   -H "x-api-key: $SYNC_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "name": "Speaker clone",
>     "assetId": "asset_talkinghead"
>   }'

Synthesize a new line in the cloned voice

Pass the returned voiceId to POST /v2/tts to generate audio of a brand-new script in that voice.

$ curl -X POST https://api.sync.so/v2/tts \
>   -H "x-api-key: $SYNC_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "voiceId": "cloned_9f8e7d6c",
>     "script": "Here is a brand new line, spoken in my own voice."
>   }'

Poll the TTS job until it completes, then take the resulting synthesizedAudioUrl.

Lip sync the audio onto a different video

Send the synthesized audio and a different target video to POST /v2/generate. The target speaker’s lips are driven by the cloned-voice audio.

$ curl -X POST https://api.sync.so/v2/generate \
>   -H "x-api-key: $SYNC_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "model": "lipsync-2",
>     "input": [
>       { "type": "video", "url": "https://assets.sync.so/uploads/target-video.mp4" },
>       { "type": "audio", "url": "https://assets.sync.so/tts/synthesized-line.wav" }
>     ],
>     "options": { "sync_mode": "cut_off" }
>   }'

Poll GET /v2/generate/{id} until status is COMPLETED, then read outputUrl.

The end-to-end version of this pipeline in Python and TypeScript:

1 import time
2 from sync import Sync
3 from sync.common import Audio, GenerationOptions, Video
4 
5 sync = Sync()
6 
7 # 1. Clone the voice from a talking-head video already in sync. labs storage
8 voice = sync.voices.clone(
9     name="Speaker clone",
10     asset_id="asset_talkinghead",
11 )
12 
13 # 2. Synthesize a new line in the cloned voice
14 tts = sync.tts.create(
15     voice_id=voice.voice_id,
16     script="Here is a brand new line, spoken in my own voice.",
17 )
18 synthesized_audio_url = tts.synthesized_audio_url
19 
20 # 3. Lip sync that audio onto a different video
21 response = sync.generations.create(
22     input=[
23         Video(url="https://assets.sync.so/uploads/target-video.mp4"),
24         Audio(url=synthesized_audio_url),
25     ],
26     model="lipsync-2",
27     options=GenerationOptions(sync_mode="cut_off"),
28 )
29 
30 job_id = response.id
31 generation = sync.generations.get(job_id)
32 while generation.status not in ["COMPLETED", "FAILED", "REJECTED"]:
33     time.sleep(10)
34     generation = sync.generations.get(job_id)
35 
36 if generation.status == "COMPLETED":
37     print(f"Video ready: {generation.output_url}")

FAQ

What sources can I clone from?

Audio and video samples hosted in sync. labs storage. For video, the audio track is extracted automatically and the first 2 minutes are used. Sources hosted outside sync. labs storage are not accepted — upload local files via POST /v2/assets/upload first and pass the returned assetId, or pass the url of an asset already in sync. labs storage.

Why did my clone return a 403?

Clone slots are limited per plan. A 403 from POST /v2/voices means you have reached your clone limit. Delete a voice you no longer need with DELETE /v2/voices/{id} to free a slot, then retry. Deleting a voice frees the slot immediately.

Where do I use the returned voiceId?

Anywhere a voice is accepted: as voiceId in POST /v2/tts to synthesize speech, and in text inputs on POST /v2/generate. You can also retrieve it later from GET /v2/voices, where it appears as the voice’s id.

Do I pass both url and assetId?

No — provide exactly one. Use assetId when you have uploaded the sample through the Asset Uploads flow, or url when the sample already lives in sync. labs storage.

Text-to-Speech — synthesize speech with a cloned voiceId.
Asset Uploads — upload local audio or video into sync. labs storage before cloning.
Voices API reference — full request and response schemas for list, clone, and delete.

Listing voices

Cloning a voice

Clone from an uploaded asset

Clone from a hosted URL

Request body

Response

Deleting a voice

The flagship flow

Upload the source video (if it's a local file)

Clone the voice from the video

Synthesize a new line in the cloned voice

Lip sync the audio onto a different video

FAQ

What sources can I clone from?

Why did my clone return a 403?

Where do I use the returned voiceId?

Do I pass both url and assetId?

Related