Concurrency & Rate Limiting
Rate Limiting
Sync API has the following rate limits:
- POST /v2/generate: 60 requests per minute
- All other endpoints: 600 requests per minute
On exceeding these limits, you will receive a 429 error.
Concurrency
Concurrency refers to the number of generations that can be submitted/processed concurrently. Requests to create new generations will fail with a 429 error if the concurrency limit is exceeded.
To check your generations currently in PENDING/PROCESSING state, you can use the List Generations endpoint.
Concurrency limits are defined in the subscription plan. Current limits are:
Handling 429 Errors
When you exceed a rate limit or concurrency limit, the API returns a 429 Too Many Requests response. This applies to both per-minute rate limits and concurrent generation limits.
What to do when you hit a 429:
- Rate limit (requests per minute): Wait briefly and retry. The limit resets every minute, so a short pause is usually enough.
- Concurrency limit: You already have the maximum number of generations in PENDING or PROCESSING state for your plan. Wait for an existing generation to complete before submitting a new one, or upgrade your plan for higher limits.
Do not retry 429 responses immediately in a tight loop. This wastes requests and delays recovery. Use the retry strategy below instead.
Retry Strategies
Exponential backoff is the recommended approach for handling both rate limit and transient errors from the Sync lip sync API. Each retry waits progressively longer, reducing pressure on the API and improving your success rate.
Python
TypeScript
Optimizing Concurrency
To get the most out of your plan’s concurrent generation slots:
- Monitor active jobs. Use the List Generations endpoint to check how many jobs are currently in PENDING or PROCESSING state before submitting new ones.
- Queue on your side. Maintain a local queue and only submit a new generation when a slot frees up. This avoids wasted 429 responses.
- Use webhooks. Configure a webhook to get notified when a generation completes, so you can immediately submit the next job without polling.
- Batch when possible. If you are on a Scale or Enterprise plan, the Batch API handles queueing and concurrency for you — submit up to 500 generations in one call.
For API rate limiting best practices and general error handling, see the Error Handling guide.

