Concurrency & Rate Limiting

Understanding rate limiting and concurrency

Rate Limiting

Sync API has the following rate limits:

  • POST /v2/generate: 60 requests per minute
  • All other endpoints: 600 requests per minute

On exceeding these limits, you will receive a 429 error.

Concurrency

Concurrency refers to the number of generations that can be submitted/processed concurrently. Requests to create new generations will fail with a 429 error if the concurrency limit is exceeded.

To check your generations currently in PENDING/PROCESSING state, you can use the List Generations endpoint.

Concurrency limits are defined in the subscription plan. Current limits are:

PlanConcurrent Requests
Hobbyist1
Creator3
Growth6
Scale15
EnterpriseCustom

Handling 429 Errors

When you exceed a rate limit or concurrency limit, the API returns a 429 Too Many Requests response. This applies to both per-minute rate limits and concurrent generation limits.

What to do when you hit a 429:

  • Rate limit (requests per minute): Wait briefly and retry. The limit resets every minute, so a short pause is usually enough.
  • Concurrency limit: You already have the maximum number of generations in PENDING or PROCESSING state for your plan. Wait for an existing generation to complete before submitting a new one, or upgrade your plan for higher limits.

Do not retry 429 responses immediately in a tight loop. This wastes requests and delays recovery. Use the retry strategy below instead.

Retry Strategies

Exponential backoff is the recommended approach for handling both rate limit and transient errors from the Sync lip sync API. Each retry waits progressively longer, reducing pressure on the API and improving your success rate.

1import time
2from sync import Sync
3from sync.common import Audio, Video
4from sync.core.api_error import ApiError
5
6sync = Sync()
7
8def create_with_retry(video_url: str, audio_url: str, max_retries: int = 5):
9 for attempt in range(max_retries):
10 try:
11 return sync.generations.create(
12 input=[Video(url=video_url), Audio(url=audio_url)],
13 model="lipsync-2",
14 )
15 except ApiError as e:
16 if e.status_code == 429 and attempt < max_retries - 1:
17 wait = 2 ** attempt # 1s, 2s, 4s, 8s, 16s
18 print(f"Rate limited. Retrying in {wait}s...")
19 time.sleep(wait)
20 else:
21 raise
22
23response = create_with_retry(
24 "https://assets.sync.so/docs/example-video.mp4",
25 "https://assets.sync.so/docs/example-audio.wav",
26)
27print(f"Job submitted: {response.id}")
1import { SyncClient, SyncError } from "@sync.so/sdk";
2
3const sync = new SyncClient();
4
5async function createWithRetry(videoUrl: string, audioUrl: string, maxRetries = 5) {
6 for (let attempt = 0; attempt < maxRetries; attempt++) {
7 try {
8 return await sync.generations.create({
9 input: [
10 { type: "video", url: videoUrl },
11 { type: "audio", url: audioUrl },
12 ],
13 model: "lipsync-2",
14 });
15 } catch (err) {
16 if (err instanceof SyncError && err.statusCode === 429 && attempt < maxRetries - 1) {
17 const wait = 2 ** attempt * 1000; // 1s, 2s, 4s, 8s, 16s
18 console.log(`Rate limited. Retrying in ${wait / 1000}s...`);
19 await new Promise((r) => setTimeout(r, wait));
20 } else {
21 throw err;
22 }
23 }
24 }
25}
26
27const response = await createWithRetry(
28 "https://assets.sync.so/docs/example-video.mp4",
29 "https://assets.sync.so/docs/example-audio.wav",
30);
31console.log(`Job submitted: ${response.id}`);

Optimizing Concurrency

To get the most out of your plan’s concurrent generation slots:

  • Monitor active jobs. Use the List Generations endpoint to check how many jobs are currently in PENDING or PROCESSING state before submitting new ones.
  • Queue on your side. Maintain a local queue and only submit a new generation when a slot frees up. This avoids wasted 429 responses.
  • Use webhooks. Configure a webhook to get notified when a generation completes, so you can immediately submit the next job without polling.
  • Batch when possible. If you are on a Scale or Enterprise plan, the Batch API handles queueing and concurrency for you — submit up to 500 generations in one call.

For API rate limiting best practices and general error handling, see the Error Handling guide.