lipsync-1.9 beta: a new quality standard

This is the biggest single jump our lip sync ai has made in a release. lipsync-1.9-beta is the most natural lipsyncing model we’ve ever shipped, and it’s available today.

It’s zero-shot, which means there’s no extra training data needed to use it. Whether you have an hour of footage or three seconds, the model handles it the same way, and it works across live-action, animation, and AI-generated video.

Try it in the playground or via the API.

We rolled early versions out to a small group of users first. Even with that limited release, the response was unusually fast: 1.9 became the most-used model in the group within a few days, generating hundreds of hours of video before we’d opened it up to everyone.

The reason it’s better is structural. Our older pipelines moved video through a sequence of stages, and small errors at each stage compounded by the time you reached the final frame. lipsync-1.9 is end-to-end and operates in a single shot, which removes the handoffs where errors used to accumulate. Most of the perceptible quality gain comes from this single change.

It’s also a turning point in how we design models going forward. Trained across millions of speakers and tens of thousands of hours of video, the architecture behind 1.9 is what gets us closer to a future where any video can be made in a single take and rewritten freely afterward.

A few examples of how the model has progressed and what creators are doing with it today:

How far we’ve come since 1.7.1
How we made Lex and Zelenksy’s interview feel more human
How you can automatically replace dialogue in any scene

Thanks for sticking with us through every version. Excited to see what gets made with this one 🙌