lipsync 2.0
most natural lipsyncing model in the world

Quick Overview:
- lipsync-2 : the most natural video-to-video lipsyncing model in the world
- Zero-shot, no need to wait for an “actor”, “clone”, or “avatar” to train before using it.
- Learns and generates a speaker’s unique style of speech
- Works across live-action, animated, and AI-generated humans
- Use it to build video translation, word-level editing of video, and character re-animation workflows (including generating realistic AI UGC)
A whole new model
Introducing lipsync-2, the world's first zero-shot lipsyncing model that preserves a speaker's unique style without additional training or fine-tuning
lipsync-2 is a leap forward in realism, expressiveness, control, quality, and speed across live-action, animated, and AI-generated video
Features
Introducing zero-shot lipsync: style preservation
lipsync 2.0 learns from a representation of how a person speaks by watching how they speak from the input.
Notice how even across different languages, we preserve the speaking style of Nicolas Cage. Sync is the first zero-shot lipsyncing model to achieve this
Temperature Control: Ability to control how expressive the lipsync generates.
Active speaker detection: Handle long videos with multiple speakers — we built, ASD-1, a state-of-the-art active speaker detection pipeline that associates a unique voice with a unique face, and only applies lipsync when we detect that person is actively speaking.
Flawless animation: Works across animated characters, from Pixar-level animations to AI generated characters. Translation is only the beginning, with the power to edit dialogue in any video in post-production we’re on the cusp of reimagining how we create, edit, and consume videos forever.
Record Once & Edit Dialogue Forever: A world where you only ever have to hit record once. lipsync-2 is the only model that let’s you edit a dialogue while preserving the original speakers style, without needing to train or fine-tune beforehand.
AI Video
In an age where we can generate any video by typing a few lines of text, we don’t have to limit ourselves to what we can capture with a camera.
At sync, we believe AI lipsync is just the beginning.
We live in an extraordinary age.
A high schooler can craft a masterpiece with an iPhone. A studio can produce a movie at a tenth of the cost 10x faster. Every video can be distributed worldwide in any language, instantly. We make video as malleable as text.
Additional Resources:
Get $5 in credits here
Docs