For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
SupportStatusTry now
DocumentationAPI Reference
DocumentationAPI Reference
    • Studio
    • Discord
    • Blog
    • Changelog
  • Getting Started
    • Introduction
    • Quickstart
    • Free Trial
  • Product
    • How AI Lip Sync Works
    • Use Cases
    • Billing
    • Integrations
    • Experimental features
    • Generation Times & Performance
    • Troubleshooting
  • Compatibility and Tips
    • Web Browser Support
    • Media Formats Support
    • Media Content Tips
    • Improving Lip Sync Quality
  • WebApp Guides
    • Speaker Selection
    • Dubbing
  • Developer Guides
    • SDKs
    • Python SDK Guide
    • TypeScript SDK Guide
    • Segments
    • Error Handling
    • Speaker Selection
    • Example Projects
  • Tutorials
    • Dubbing
    • Video Dubbing API Guide
    • Video Translation API Guide
    • Text-to-Speech Lip Sync
    • Personalized Video Messaging
    • Translation/Dubbing
  • Plugins & Extensions
    • MCP Server
    • ComfyUI
LogoLogo
SupportStatusTry now
On this page
  • Supported Formats
  • Video
  • Audio
  • Full Support
  • Limited Support
  • Output Quality
  • Preserving Color
  • Recommended Input Properties
  • Video
  • Audio
  • Codec Quality Comparison
  • Frequently Asked Questions
  • Related Resources
Compatibility and Tips

Media Formats Support

Was this page helpful?
Edit this page

Last updated May 22, 2026

Previous

Media Content Tips

Next
Built with

Supported Formats

Video

MIME TypeExtensionFormat
video/mp4.mp4MP4
video/quicktime.mov, .qtQuickTime
video/webm.webmWebM
video/x-msvideo.aviAVI

Audio

Full Support

MIME TypeExtensionFormat
audio/wav.wavWAV
audio/mpeg.mp3MP3
audio/ogg.oggOGG
audio/flac.flacFLAC
audio/alac.alacALAC
audio/mp4.mp4MP4 Audio

Limited Support

The following formats have partial support due to licensing or legal restrictions:

MIME TypeExtensionFormat
audio/x-ms-wma.wmaWMA
audio/x-m4a.m4aM4A
audio/x-m3a.m3aM3A
audio/aac.aacAAC

For best compatibility, use MP4 for video and WAV or MP3 for audio.

Output Quality

All output is re-encoded to H.264 using libx264 (-crf 17 -preset slow), regardless of the input codec. Frames are processed in RGB color space internally during generation, so the original bitrate, frame rate, and color grading may differ in the output.

Don’t rely on bitrate for quality preservation: Outputs are re-encoded and bitrate may change.

HDR is not fully supported: HDR videos are normalized to SDR, which may affect color grading in the output.

Alpha channels are removed: The H.264/RGB pipeline does not support transparency. Alpha channels are replaced with a solid background.

Preserving Color

For color-sensitive H.264 workflows - especially when compositing generated output back onto source footage - use explicit SDR color metadata and 4:4:4 chroma sampling.

  • Tag SDR BT.709 metadata explicitly: Set color_space, color_transfer, color_range, and color_primaries. Untagged or partially tagged files can be interpreted differently across decoders. The pipeline uses ffmpeg 7.1 for color metadata detection.
  • Prefer yuv444p when color accuracy matters: The pipeline operates in RGB. 4:2:0 and 4:2:2 inputs require chroma upsampling during YUV→RGB conversion, which can cause color shifts or compositing seams. 4:4:4 preserves full chroma resolution through that conversion.
  • Export 4:4:4 from your source tool: Converting an existing 4:2:0 file to 4:4:4 cannot restore discarded chroma detail.
  • For lossless or pixel-level workflows, contact support.

Example FFmpeg command for a tagged SDR BT.709 H.264 export:

$ffmpeg -i input.mov \
> -c:v libx264 -pix_fmt yuv444p \
> -color_range tv -colorspace bt709 -color_primaries bt709 -color_trc bt709 \
> -crf 17 -preset slow \
> -c:a aac -b:a 192k \
> output.mp4

Recommended Input Properties

Video

PropertyRecommended Value
CodecH.264 High Profile for general use; H.264 4:4:4 for color-sensitive work
Resolution1920×1080
Average Bitrate≥ 10 Mbps
Frame Rate24, 25, or 30 fps (constant)
Color Space8-bit SDR BT.709
Color MetadataExplicitly tag range, primaries, transfer, and matrix
Chroma Samplingyuv420p or yuv422p for general use; yuv444p when color preservation is critical

4K maximum: Videos above 4096×2160 are rejected. Downscale to 4K or below before uploading.

Audio

  • Sample rate: 44.1 kHz or 48 kHz. Higher rates are downsampled to 48 kHz, which may reduce quality.
  • Bit depth: Up to 32-bit float.
  • Channels: Up to 7.1. Spatial audio is not supported.
  • Multiple streams: Only the first audio stream is processed; all others are discarded.

Codec Quality Comparison

All input codecs are transcoded to a standard format, so processing speed is consistent. Quality loss varies by codec, measured using VMAF:

Input CodecOutput Quality
H.264Best (least quality loss)
MPEG-2Good (up to 15% quality loss)
H.265Good (up to 15% quality loss)
VP9Fair (up to 20% quality loss)
AV1Fair (over 20% quality loss)

Frequently Asked Questions

What is the maximum file size for uploads?

Direct file uploads are limited to 20 MB. If your file exceeds this limit:

  • Host the file at a publicly accessible URL (S3 bucket, CDN, or any web server).
  • Pass the URL in the url field of your video or audio input instead of uploading directly.

There is no file size limit for URL-based inputs - the file is downloaded from your URL during processing. For production pipelines, URL-based inputs are recommended regardless of file size, as they avoid upload timeouts and are more reliable.

Hosted files must be publicly accessible without authentication headers.

What is the maximum video duration?

Maximum duration depends on your plan:

PlanMaximum Duration
Free20 seconds
Hobbyist1 minute
Scale+30 minutes

Check the pricing page for your plan’s specific limit.

  • react-1 has a hard limit of 15 seconds regardless of plan - it is designed for short-form expressive content.
  • For videos exceeding your plan’s limit, use the Segments API to split and process them in shorter chunks.
Does Sync Labs support vertical or portrait videos?

Yes - any aspect ratio is supported, including vertical (9:16), horizontal (16:9), square (1:1), and custom dimensions.

The pipeline extracts the face region at 512×512 for processing, then composites it back into the original frame. The output always matches the input dimensions and orientation.

For best face detection in vertical videos, ensure the speaker’s face is clearly visible and well-lit.

Related Resources

  • Media Content Tips — best practices for preparing your video and audio content for optimal lip sync results
  • Lipsync Model — learn about supported models and their input requirements
  • Quickstart — get started with your first Sync Labs generation