Troubleshooting

Running into an issue? Check the common problems and solutions below. Most issues can be resolved quickly with the right steps.

A 401 error means your API key is missing or invalid. To fix this:

  1. Make sure you’re including the x-api-key header in every API request
  2. Verify your API key is correct — copy it directly from Settings > API Keys
  3. If your key still doesn’t work, regenerate a new one in the Sync Studio

See the Authentication guide for full details on setting up your API key.

Generations typically complete in 30 to 120 seconds depending on video length and model. If your generation appears stuck:

  1. Use polling or webhooks to check the current status
  2. Wait at least 2 minutes before assuming something is wrong — longer videos take more time
  3. If a generation is still in PENDING after 5 minutes, contact support at support@sync.so

Avoid re-submitting the same job repeatedly, as this may increase queue times.

A 429 status code means you’ve exceeded your plan’s concurrency limits. To handle this:

  1. Implement exponential backoff — wait progressively longer between retries
  2. Check your plan’s concurrency limits on the Concurrency & Rate Limits page
  3. Consider upgrading your plan for higher concurrency limits
  4. For large workloads, use the Batch API to queue multiple jobs efficiently

Output quality depends heavily on your input video, audio, and model choice. For best results, ensure the speaker’s face is front-facing, well-lit, and occupies a reasonable portion of the frame at a minimum of 480p resolution. Avoid obstructions like hands, microphones, or hair covering the mouth area. Use clean audio without background music or overlapping speakers, as noise degrades lip-to-audio alignment. For model selection, lipsync-2 handles the majority of videos well and preserves natural speaking style, while lipsync-2-pro uses diffusion-based super resolution for the best results with beards, teeth, and fine facial detail. For longer videos, audio-video duration mismatches can cause drift — use the sync_mode parameter (e.g., cut_off, bounce, or remap) to control how mismatches are handled.

See Media Content Tips for detailed guidance on input quality.

Sync supports a wide range of video and audio formats. If your file isn’t accepted:

  1. Check the Media Formats Support page for the full list of supported formats
  2. Convert your file using FFmpeg:
    $# Convert video to MP4
    $ffmpeg -i input.avi -c:v libx264 -c:a aac output.mp4
    $
    $# Convert audio to WAV
    $ffmpeg -i input.ogg -ar 16000 output.wav

If your webhook endpoint isn’t getting called:

  1. HTTPS required — Your endpoint must be a publicly accessible HTTPS URL
  2. Check firewall rules — Make sure incoming POST requests from Sync’s servers aren’t blocked
  3. Verify the URL — Double-check the webhook URL you passed in your API call
  4. Test locally — Use a tool like ngrok to expose a local server for testing
  5. Check response codes — Your endpoint must return a 2xx status code to acknowledge receipt

If you’re having trouble installing the Python SDK:

  1. Check your Python version — The SDK requires Python 3.8 or higher
    $python --version
  2. Upgrade the package — Try reinstalling with the latest version:
    $pip install --upgrade syncsdk
  3. Use a virtual environment — Avoid conflicts with other packages:
    $python -m venv .venv
    $source .venv/bin/activate
    $pip install syncsdk

See the Python SDK Guide for full setup instructions.

If the TypeScript SDK isn’t working as expected:

  1. Check your Node.js version — The SDK requires Node.js 18 or higher
    $node --version
  2. Reinstall the package:
    $npm i @sync.so/sdk
  3. Check your package.json — Make sure @sync.so/sdk is listed in your dependencies
  4. TypeScript version — Ensure you’re using TypeScript 4.7 or higher if using TypeScript

See the TypeScript SDK Guide for full setup instructions.

Watermarks appear on videos generated with free or Hobbyist accounts. To remove watermarks:

  • Upgrade to the Creator plan or higher — watermark removal is included on Creator+
  • See our Billing page for plan details and pricing

Existing videos generated on a free or Hobbyist plan will retain their watermarks. Generate new videos after upgrading to get unwatermarked output.

If Sync can’t detect a face or selects the wrong person:

  1. Ensure face is clearly visible — The face should be unobstructed, well-lit, and occupy a reasonable portion of the frame
  2. Check face angle — Frontal or near-frontal faces work best; extreme side profiles may not be detected
  3. Multi-person videos — If there are multiple faces in the frame, use the Speaker Selection feature to target the correct person
  4. Resolution — Very low-resolution video may make face detection unreliable; use at least 480p

See the Speaker Selection guide for details on selecting specific faces in multi-person videos.

Generation time depends on the model, video length, and resolution. As a general guide: lipsync-1.9.0-beta is the fastest model, typically completing a 30-second clip in well under a minute. lipsync-2 takes a few minutes for most videos and is the recommended default. lipsync-2-pro is 1.5—2x slower than lipsync-2 due to its diffusion-based super resolution step, so expect longer waits for premium quality. Higher resolution inputs and longer video durations increase processing time proportionally. To monitor progress, use polling (check the status field on GET /v2/generate/{id}) or set up webhooks for real-time status callbacks when the job completes. If your generation remains in PENDING or PROCESSING for more than 10 minutes, the job may have encountered an infrastructure issue. Avoid resubmitting the same request repeatedly, as this creates duplicate queue entries and slows processing further. Instead, contact support@sync.so with your generation ID for investigation.

Sync uses a subscription-plus-usage billing model processed through Stripe. Common payment issues include declined cards, unexpected charges appearing after cancellation (usage charges still apply until the end of your billing cycle), and unpaid usage invoices blocking new generations. If your card is declined, update your payment method at sync.so/billing/subscription — Stripe retries failed charges for up to 5 days before automatically cancelling the subscription. If you see a charge you do not recognize, check your usage history at sync.so/billing/usage — usage invoices are generated automatically each time your accumulated spend hits your tier’s threshold (6 dollars for Hobbyist, 20 for Creator, 50 for Growth, 250 for Scale). For refund requests, go to your billing page and click Manage billing to access Stripe’s Cancel + refund flow directly. For any billing issue not resolved through the dashboard, email support@sync.so. See the Billing page for full pricing and payment details.

Lip sync drift on longer videos typically happens when the audio and video durations do not match precisely, or when the video contains segments where the speaker is not actively talking. Sync processes long videos in 30—40 second chunks internally, so scene changes or cuts within those chunks can confuse face tracking and cause brief misalignment. To fix drift, set the sync_mode parameter to cut_off (trims audio to video length) or remap (adjusts video speed to match audio). For videos over 1 minute with multiple scenes, consider splitting them into segments using the Segments API, where each segment gets its own audio input for tighter control. Using lipsync-2-pro also improves quality in challenging footage. Ensure the input video shows the speaker actively talking throughout — static or still frames cannot produce good lip movements.

TTS works out of the box on all plans using Sync’s built-in ElevenLabs integration — no setup required. If you want more control, you can optionally bring your own ElevenLabs API key on Creator plans and above by configuring it at Integrations settings. TTS failures most commonly stem from an invalid ElevenLabs voice ID or exceeding the 5,000-character script limit. Verify your voiceId is a valid ElevenLabs voice ID string (not a voice name or display label), and keep your script under 5,000 characters per generation request. For longer scripts, use the Segments API to split text across multiple TTS inputs with different time ranges. If the generation completes but produces audio-only output without video, ensure you included a valid video input in your request alongside the TTS input. For persistent generation_text_length_exceeded or generation_input_validation_failed errors, see the Error Handling page for detailed resolution steps.

Sync accepts MP4, MOV, WebM, and AVI for video, and WAV, MP3, OGG, FLAC, ALAC, and MP4 audio with full support (WMA, M4A, and AAC have limited support due to licensing restrictions). If your file is rejected, first check the format against the Media Formats Support page. For API uploads, all media must be hosted at a publicly accessible URL — private, authenticated, or expired URLs will fail silently. The recommended video codec is H.264 (High Profile) at a maximum resolution of 4K (4096x2160 pixels); videos above 4K are rejected outright. Audio should use a 44.1kHz or 48kHz sample rate for best results. If your file uses an unsupported codec, convert it with FFmpeg: ffmpeg -i input.avi -c:v libx264 -c:a aac output.mp4. Videos missing an audio track or required metadata fields (duration, frame rate) will return a generation_media_metadata_missing error. Note that HDR (10-bit color) video is automatically normalized to SDR, which may alter your color grading.

For videos with multiple people visible in the frame, use the Speaker Selection feature to target the correct face. Set options.active_speaker_detection.auto_detect to true to let Sync automatically identify the active speaker, or provide a manual frame_number and coordinates pointing to the target speaker’s face for fully deterministic control. You can also supply per-frame bounding_boxes if you already run your own face detection. If your video has multiple speakers talking at different times (such as a two-person podcast or interview), use the Segments API to assign different audio inputs to different time ranges within the video — each segment can target a different speaker with its own audio. For best results, ensure each speaker’s face is clearly visible and front-facing during their speaking segment. If Sync selects the wrong face, provide explicit coordinates rather than relying on auto-detection. See the Speaker Selection API guide and Segments guide for complete code examples.

Still Need Help?

If your issue isn’t covered above, here are more resources:

Support Knowledge Base

For step-by-step walkthroughs and additional troubleshooting, visit the Sync Support Knowledge Base: