Introducing lipsync-1.9-beta — a new standard for lipsync quality
The most natural lipsyncing model in the world

We're thrilled to announce the biggest upgrade to our model lineup in our history: lipsync-1.9-beta, the most natural lipsyncing model in the world.
It’s zero-shot, which means you don’t need any extra training data to use it — if you have an hour or even just a few seconds of video it doesn’t matter, seamlessly generate and edit speech naturally across live action, animated, or even AI generated video.
It’s available now, try it in the playground or via API.
We’ve slowly rolled out early versions of this model to some of you – even with a limited release the response has been overwhelming. Across a small segment of users we’ve already seen this model become the most popular choice generating hundreds of hours in just a few days.
Our old pipelines accumulated errors as the video passed from one stage into another.
Lipsync-1.9 is an end-to-end monolith that operates in a single shot. This helps it make very few mistakes across a wide range of videos.
It marks a profound shift in how we design our models. trained across millions of speakers + tens of thousands of hours of video, this new approach will pave the way to a future where any content can be made in a single take.
Check out some examples showing how we've improved over the last few months, and how thousands of content creators and businesses are using it today:
- How far we've come since 1.7.1
- How we made Lex and Zelenksy's interview feel more human
- How you can automatically replace dialogue in any scene
We’re grateful for your support as we’ve learned and grown, and we’re particularly excited about how this new architecture scales.
Looking forward to seeing what you all make 🙌