research

lipsync 2.0

most natural lipsyncing model in the world

tair

20 Aug 2025 • 2 min read

Quick Overview:

lipsync-2 : the most natural video-to-video lipsyncing model in the world
Zero-shot, no need to wait for an “actor”, “clone”, or “avatar” to train before using it.
Learns and generates a speaker’s unique style of speech
Works across live-action, animated, and AI-generated humans
Use it to build video translation, word-level editing of video, and character re-animation workflows (including generating realistic AI UGC)

A whole new model

Introducing lipsync-2, the world's first zero-shot lipsyncing model that preserves a speaker's unique style without additional training or fine-tuning

lipsync-2 is a leap forward in realism, expressiveness, control, quality, and speed across live-action, animated, and AI-generated video

Features

Introducing zero-shot lipsync: style preservation

lipsync 2.0 learns from a representation of how a person speaks by watching how they speak from the input.

Notice how even across different languages, we preserve the speaking style of Nicolas Cage. Sync is the first zero-shot lipsyncing model to achieve this

0:00

/0:56

Temperature Control: Ability to control how expressive the lipsync generates.

0:00

/0:29

Active speaker detection: Handle long videos with multiple speakers — we built, ASD-1, a state-of-the-art active speaker detection pipeline that associates a unique voice with a unique face, and only applies lipsync when we detect that person is actively speaking.

Flawless animation: Works across animated characters, from Pixar-level animations to AI generated characters. Translation is only the beginning, with the power to edit dialogue in any video in post-production we’re on the cusp of reimagining how we create, edit, and consume videos forever.

Record Once & Edit Dialogue Forever: A world where you only ever have to hit record once. lipsync-2 is the only model that let’s you edit a dialogue while preserving the original speakers style, without needing to train or fine-tune beforehand.

AI Video

In an age where we can generate any video by typing a few lines of text, we don’t have to limit ourselves to what we can capture with a camera.

At sync, we believe AI lipsync is just the beginning.

We live in an extraordinary age.

A high schooler can craft a masterpiece with an iPhone. A studio can produce a movie at a tenth of the cost 10x faster. Every video can be distributed worldwide in any language, instantly. We make video as malleable as text.

Additional Resources:

Get $5 in credits here

Docs