Media Formats Support
Supported Formats
Video
Audio
Full Support
Limited Support
The following formats have partial support due to licensing or legal restrictions:
For best compatibility, use MP4 for video and WAV or MP3 for audio.
Output Quality
All output is re-encoded to H.264 using libx264 (-crf 17 -preset slow), regardless of the input codec. Frames are processed in RGB color space internally during generation, so the original bitrate, frame rate, and color grading may differ in the output.
Don’t rely on bitrate for quality preservation: Outputs are re-encoded and bitrate may change.
HDR is not fully supported: HDR videos are normalized to SDR, which may affect color grading in the output.
Alpha channels are removed: The H.264/RGB pipeline does not support transparency. Alpha channels are replaced with a solid background.
Preserving Color
For color-sensitive H.264 workflows - especially when compositing generated output back onto source footage - use explicit SDR color metadata and 4:4:4 chroma sampling.
- Tag SDR BT.709 metadata explicitly: Set
color_space,color_transfer,color_range, andcolor_primaries. Untagged or partially tagged files can be interpreted differently across decoders. The pipeline uses ffmpeg 7.1 for color metadata detection. - Prefer
yuv444pwhen color accuracy matters: The pipeline operates in RGB. 4:2:0 and 4:2:2 inputs require chroma upsampling during YUV→RGB conversion, which can cause color shifts or compositing seams. 4:4:4 preserves full chroma resolution through that conversion. - Export 4:4:4 from your source tool: Converting an existing 4:2:0 file to 4:4:4 cannot restore discarded chroma detail.
- For lossless or pixel-level workflows, contact support.
Example FFmpeg command for a tagged SDR BT.709 H.264 export:
Recommended Input Properties
Video
4K maximum: Videos above 4096×2160 are rejected. Downscale to 4K or below before uploading.
Audio
- Sample rate: 44.1 kHz or 48 kHz. Higher rates are downsampled to 48 kHz, which may reduce quality.
- Bit depth: Up to 32-bit float.
- Channels: Up to 7.1. Spatial audio is not supported.
- Multiple streams: Only the first audio stream is processed; all others are discarded.
Codec Quality Comparison
All input codecs are transcoded to a standard format, so processing speed is consistent. Quality loss varies by codec, measured using VMAF:
Frequently Asked Questions
What is the maximum file size for uploads?
Direct file uploads are limited to 20 MB. If your file exceeds this limit:
- Host the file at a publicly accessible URL (S3 bucket, CDN, or any web server).
- Pass the URL in the
urlfield of your video or audio input instead of uploading directly.
There is no file size limit for URL-based inputs - the file is downloaded from your URL during processing. For production pipelines, URL-based inputs are recommended regardless of file size, as they avoid upload timeouts and are more reliable.
Hosted files must be publicly accessible without authentication headers.
What is the maximum video duration?
Maximum duration depends on your plan:
Check the pricing page for your plan’s specific limit.
- react-1 has a hard limit of 15 seconds regardless of plan - it is designed for short-form expressive content.
- For videos exceeding your plan’s limit, use the Segments API to split and process them in shorter chunks.
Does Sync Labs support vertical or portrait videos?
Yes - any aspect ratio is supported, including vertical (9:16), horizontal (16:9), square (1:1), and custom dimensions.
The pipeline extracts the face region at 512×512 for processing, then composites it back into the original frame. The output always matches the input dimensions and orientation.
For best face detection in vertical videos, ensure the speaker’s face is clearly visible and well-lit.
Related Resources
- Media Content Tips — best practices for preparing your video and audio content for optimal lip sync results
- Lipsync Model — learn about supported models and their input requirements
- Quickstart — get started with your first Sync Labs generation

