Rethinking the visual cues in audio-visual speaker extraction

06/05/2023
by   Junjie Li, et al.
0

The Audio-Visual Speaker Extraction (AVSE) algorithm employs parallel video recording to leverage two visual cues, namely speaker identity and synchronization, to enhance performance compared to audio-only algorithms. However, the visual front-end in AVSE is often derived from a pre-trained model or end-to-end trained, making it unclear which visual cue contributes more to the speaker extraction performance. This raises the question of how to better utilize visual cues. To address this issue, we propose two training strategies that decouple the learning of the two visual cues. Our experimental results demonstrate that both visual cues are useful, with the synchronization cue having a higher impact. We introduce a more explainable model, the Decoupled Audio-Visual Speaker Extraction (DAVSE) model, which leverages both visual cues.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/22/2023

Target Active Speaker Detection with Audio-visual Cues

In active speaker detection (ASD), we would like to detect whether an on...
research
08/01/2019

Visual cues in estimation of part-to-whole comparison

Pie charts were first published in 1801 by William Playfair and have cau...
research
09/13/2023

PIAVE: A Pose-Invariant Audio-Visual Speaker Extraction Network

It is common in everyday spoken communication that we look at the turnin...
research
05/07/2022

Timestamp-independent Haptic-Visual Synchronization

The booming haptic data significantly improves the users'immersion durin...
research
10/15/2020

Muse: Multi-modal target speaker extraction with visual cues

Speaker extraction algorithm relies on the speech sample from the target...
research
10/09/2022

VCSE: Time-Domain Visual-Contextual Speaker Extraction Network

Speaker extraction seeks to extract the target speech in a multi-talker ...
research
02/21/2023

A Reinforcement Learning Framework for Online Speaker Diarization

Speaker diarization is a task to label an audio or video recording with ...

Please sign up or login with your details

Forgot password? Click here to reset