Segments let you sync different audio clips to different time ranges within a single video in one API call. This enables multi-speaker lip sync by letting you assign different audio inputs to different parts of your video. Using segments, you can:
To use segments feature, you need to provide a top-level segments array with each item defining a video time range/segment, each with its own audio configuration.
Each segment item takes the following properties:
Segment start time in seconds
Segment end time in seconds
Audio configuration with refId and optional cropping
Each segment requires exactly one audioInput. audioInput takes the following properties:
Reference ID of the audio/text-to-speech input to use for this segment
Optional start time (in seconds) to crop the referenced audio. When specified, endTime must also be provided
Optional end time (in seconds) to crop the referenced audio. When specified, startTime must also be provided
The specified audioInput will be used to lipsync the video segment between startTime and endTime.
Provide a top-level segments array when using multiple audio or text inputs.
Ensure all audio inputs have valid url or assetId values and that referenced refId values exist in your audio or text inputs.
This error occurs when a segment’s audio_input is missing a refId or the refId is empty. Each segment must reference a valid audio or text input through its refId.
This error occurs when a segment references a refId that doesn’t exist in your audio or text inputs. Ensure all referenced refId values match exactly with those defined in your inputs.
Each segment’s startTime must be less than or equal to its endTime. Zero-length segments (where startTime equals endTime) are allowed for use cases like zero-duration crop points.
When specifying segment boundaries using frames instead of seconds, startFrame must be strictly less than endFrame. Unlike time-based segments which allow equal start and end times, frame-based segments require at least one frame of difference.
When cropping audio within a segment, both startTime and endTime must be provided, and startTime must be less than or equal to endTime.
Ensure you have at least one audio input or text input with a valid refId when using segments.