A Light Weight Model for Active Speaker Detection

03/08/2023
by   Junhua Liao, et al.
0

Active speaker detection is a challenging task in audio-visual scenario understanding, which aims to detect who is speaking in one or more speakers scenarios. This task has received extensive attention as it is crucial in applications such as speaker diarization, speaker tracking, and automatic video editing. The existing studies try to improve performance by inputting multiple candidate information and designing complex models. Although these methods achieved outstanding performance, their high consumption of memory and computational power make them difficult to be applied in resource-limited scenarios. Therefore, we construct a lightweight active speaker detection architecture by reducing input candidates, splitting 2D and 3D convolutions for audio-visual feature extraction, and applying gated recurrent unit (GRU) with low computational complexity for cross-modal modeling. Experimental results on the AVA-ActiveSpeaker dataset show that our framework achieves competitive mAP performance (94.1 than the state-of-the-art method, especially in model parameters (1.0M vs. 22.5M, about 23x) and FLOPs (0.6G vs. 2.6G, about 4x). In addition, our framework also performs well on the Columbia dataset showing good robustness. The code and model weights are available at https://github.com/Junhua-Liao/Light-ASD.

READ FULL TEXT

page 3

page 4

page 8

research
05/20/2020

Active Speakers in Context

Current methods for active speak er detection focus on modeling short-te...
research
05/22/2023

Target Active Speaker Detection with Audio-visual Cues

In active speaker detection (ASD), we would like to detect whether an on...
research
06/21/2022

Rethinking Audio-visual Synchronization for Active Speaker Detection

Active speaker detection (ASD) systems are important modules for analyzi...
research
08/05/2021

UniCon: Unified Context Network for Robust Active Speaker Detection

We introduce a new efficient framework, the Unified Context Network (Uni...
research
03/09/2023

WASD: A Wilder Active Speaker Detection Dataset

Current Active Speaker Detection (ASD) models achieve great results on A...
research
03/04/2022

Look&Listen: Multi-Modal Correlation Learning for Active Speaker Detection and Speech Enhancement

Active speaker detection and speech enhancement have become two increasi...
research
03/28/2022

Training speaker recognition systems with limited data

This work considers training neural networks for speaker recognition wit...

Please sign up or login with your details

Forgot password? Click here to reset