The quality of your input media has a direct impact on the final lipsync output. For optimal results, please follow these tips for preparing your video and audio content.
For best performance, avoid full profile (side-view) shots and obstructions covering the face.
The model performs best when the character in the video appears to be talking naturally. It will preserve the speaker’s style during lipsync.
Tip for AI-Generated Video: When creating videos with third-party AI video generation models, include this instruction in the text prompt: "the character should be speaking naturally". The generated AI video will have some random mouth movements, which are necessary to get the best results from our lipsync model.
For best performance, avoid audio with music, background noise, or multiple simultaneous speakers.
When your video and audio have different durations, you can choose how to handle the mismatch using the sync_mode parameter. Here’s a brief overview of each option:
When video is shorter than audio, the video will reverse playback at the end to match audio duration. Otherwise, video is cropped to match audio.
When video is shorter than audio, the video will loop from the beginning to match audio duration. Otherwise, video is cropped to match audio.
When audio is longer than video, the audio will be cut off to match video duration. Otherwise, video is cropped to match audio.
When video is longer than audio, silence will be added to the audio to match video duration. Otherwise, video is cropped to match audio.
The video playback speed will be adjusted (sped up or slowed down) to exactly match the audio duration, preserving all content from both.
Default Sync Labs Mode: The default depends on your generation type and video/audio durations:
cut_off is the defaultbounce is the defaultsegments_secs or segments_frames): remap is the default, recommended to avoid abrupt cuts mid-video.Choosing the Right Sync Labs Mode: Use bounce or loop for short videos with longer audio, cut_off when you want to prioritize video length, silence when you want to preserve the full video, and remap when you need to preserve all content from both video and audio.