Running into an issue? Check the common problems and solutions below. Most issues can be resolved quickly with the right steps.
A 401 error means your API key is missing or invalid. To fix this:
x-api-key header in every API requestSee the Authentication guide for full details on setting up your API key.
Generations typically complete in 30 to 120 seconds depending on video length and model. If your generation appears stuck:
Avoid re-submitting the same job repeatedly, as this may increase queue times.
A 429 status code means you’ve exceeded your plan’s concurrency limits. To handle this:
Output quality depends heavily on your input video, audio, and model choice. For best results, ensure the speaker’s face is front-facing, well-lit, and occupies a reasonable portion of the frame at a minimum of 480p resolution. Avoid obstructions like hands, microphones, or hair covering the mouth area. Use clean audio without background music or overlapping speakers, as noise degrades lip-to-audio alignment. For model selection, lipsync-2 handles the majority of videos well and preserves natural speaking style, while lipsync-2-pro uses diffusion-based super resolution for the best results with beards, teeth, and fine facial detail. For longer videos, audio-video duration mismatches can cause drift — use the sync_mode parameter (e.g., cut_off, bounce, or remap) to control how mismatches are handled.
See Media Content Tips for detailed guidance on input quality.
For color-sensitive pipelines, especially when you composite Sync Labs output back onto the original source, use SDR BT.709 input with explicit color tags and prefer H.264 yuv444p (4:4:4) instead of yuv420p (4:2:0) or yuv422p (4:2:2). Sync Labs processes frames in RGB, so 4:2:0 and 4:2:2 input requires chroma upsampling during YUV to RGB conversion and can make small color shifts more visible at composite boundaries. H.264 outputs are re-encoded and may not preserve the original bitrate exactly; matching bitrate is not a reliable fix for color differences. See Preserving Color During Generation for recommended export settings.
Sync Labs supports a wide range of video and audio formats. If your file isn’t accepted:
If your webhook endpoint isn’t getting called:
If you’re having trouble installing the Python SDK:
See the Python SDK Guide for full setup instructions.
If the TypeScript SDK isn’t working as expected:
@sync.so/sdk is listed in your dependenciesSee the TypeScript SDK Guide for full setup instructions.
Watermarks appear on videos generated with free or Hobbyist accounts. To remove watermarks:
Existing videos generated on a free or Hobbyist plan will retain their watermarks. Generate new videos after upgrading to get unwatermarked output.
If Sync Labs can’t detect a face or selects the wrong person:
See the Speaker Selection guide for details on selecting specific faces in multi-person videos.
Generation time depends on the model, video length, and resolution. As a general guide: lipsync-1.9.0-beta is the fastest model, typically completing a 30-second clip in well under a minute. lipsync-2 takes a few minutes for most videos and is the recommended default. lipsync-2-pro is 1.5—2x slower than lipsync-2 due to its diffusion-based super resolution step, so expect longer waits for premium quality. Higher resolution inputs and longer video durations increase processing time proportionally. To monitor progress, use polling (check the status field on GET /v2/generate/{id}) or set up webhooks for real-time status callbacks when the job completes. If your generation remains in PENDING or PROCESSING for more than 10 minutes, the job may have encountered an infrastructure issue. Avoid resubmitting the same request repeatedly, as this creates duplicate queue entries and slows processing further. Instead, contact [email protected] with your generation ID for investigation.
Sync Labs uses a subscription-plus-usage billing model processed through Stripe. Common payment issues include declined cards, unexpected charges appearing after cancellation (usage charges still apply until the end of your billing cycle), and unpaid usage invoices blocking new generations. If your card is declined, update your payment method at sync.so/billing/subscription — Stripe retries failed charges for up to 5 days before automatically cancelling the subscription. If you see a charge you do not recognize, check your usage history at sync.so/billing/usage — usage invoices are generated automatically each time your accumulated spend hits your tier’s threshold (6 dollars for Hobbyist, 20 for Creator, 50 for Growth, 250 for Scale). For refund requests, go to your billing page and click Manage billing to access Stripe’s Cancel + refund flow directly. For any billing issue not resolved through the dashboard, email [email protected]. See the Billing page for full pricing and payment details.
Lip sync drift on longer videos typically happens when the audio and video durations do not match precisely, or when the video contains segments where the speaker is not actively talking. Sync Labs processes long videos in 30—40 second chunks internally, so scene changes or cuts within those chunks can confuse face tracking and cause brief misalignment. To fix drift, set the sync_mode parameter to cut_off (trims audio to video length) or remap (adjusts video speed to match audio). For videos over 1 minute with multiple scenes, consider splitting them into segments using the Segments API, where each segment gets its own audio input for tighter control. Using lipsync-2-pro also improves quality in challenging footage. Ensure the input video shows the speaker actively talking throughout — static or still frames cannot produce good lip movements.
TTS works out of the box on all plans using Sync Labs’ built-in ElevenLabs integration — no setup required. If you want more control, you can optionally bring your own ElevenLabs API key on Creator plans and above by configuring it at Integrations settings. TTS failures most commonly stem from an invalid ElevenLabs voice ID or exceeding the 5,000-character script limit. Verify your voiceId is a valid ElevenLabs voice ID string (not a voice name or display label), and keep your script under 5,000 characters per generation request. For longer scripts, use the Segments API to split text across multiple TTS inputs with different time ranges. If the generation completes but produces audio-only output without video, ensure you included a valid video input in your request alongside the TTS input. For persistent generation_text_length_exceeded or generation_input_validation_failed errors, see the Error Handling page for detailed resolution steps.
Sync Labs accepts MP4, MOV, WebM, and AVI for video, and WAV, MP3, OGG, FLAC, ALAC, and MP4 audio with full support (WMA, M4A, and AAC have limited support due to licensing restrictions). If your file is rejected, first check the format against the Media Formats Support page. For API uploads, all media must be hosted at a publicly accessible URL — private, authenticated, or expired URLs will fail silently. The recommended video codec is H.264 (High Profile) at a maximum resolution of 4K (4096x2160 pixels); videos above 4K are rejected outright. Audio should use a 44.1kHz or 48kHz sample rate for best results. If your file uses an unsupported codec, convert it with FFmpeg: ffmpeg -i input.avi -c:v libx264 -c:a aac output.mp4. Videos missing an audio track or required metadata fields (duration, frame rate) will return a generation_media_metadata_missing error. Note that HDR (10-bit color) video is automatically normalized to SDR, which may alter your color grading.
For videos with multiple people visible in the frame, use the Speaker Selection feature to target the correct face. Set options.active_speaker_detection.auto_detect to true to let Sync Labs automatically identify the active speaker, or provide a manual frame_number and coordinates pointing to the target speaker’s face for fully deterministic control. You can also supply per-frame bounding_boxes if you already run your own face detection. If your video has multiple speakers talking at different times (such as a two-person podcast or interview), use the Segments API to assign different audio inputs to different time ranges within the video — each segment can target a different speaker with its own audio. For best results, ensure each speaker’s face is clearly visible and front-facing during their speaking segment. If Sync Labs selects the wrong face, provide explicit coordinates rather than relying on auto-detection. See the Speaker Selection API guide and Segments guide for complete code examples.
If your issue isn’t covered above, here are more resources:
For step-by-step walkthroughs and additional troubleshooting, visit the Sync Labs Support Knowledge Base: