Speaker Selection — API
Speaker Selection — API
Speaker Selection — API
Speaker selection helps you target the right face when a clip contains multiple people. You can either let Sync Labs auto-detect the active speaker or provide a user-selected point from your UI and forward it via the active_speaker_detection DTO on /v2/generate. For using speaker selection in the web app, see the guide.
auto_detect: true and skip manual fields.Seek the video to a frame where the target speaker’s face is visible. Keep track of the frame index you show in the UI.
Record the [x, y] coordinates (in the same coordinate system/pixels as your extracted frame) for the clicked point on the speaker’s face. Keep the frame index and coordinates paired.
See the full API reference for active_speaker_detection.
auto_detect (boolean, default false): let Sync Labs pick the active speaker automatically.v3 (boolean, optional): enable ASD v3.frame_number (number): frame index that corresponds to the provided coordinates.coordinates ([x, y]): reference point on the speaker’s face in frame_number.bounding_boxes ((number[] | null)[], optional): per-frame array of bounding boxes across the video. Each entry corresponds to that frame: set to [x1, y1, x2, y2] (x1,y1 = top-left; x2,y2 = bottom-right) for the detected face, or null if no box for that frame. Use this instead of frame_number + coordinates when you already run detection over the clip.