Joint Speech Activity and Overlap Detection with Multi-Exit Architecture

09/24/2022
by   Ziqing Du, et al.
0

Overlapped speech detection (OSD) is critical for speech applications in scenario of multi-party conversion. Despite numerous research efforts and progresses, comparing with speech activity detection (VAD), OSD remains an open challenge and its overall performance is far from satisfactory. The majority of prior research typically formulates the OSD problem as a standard classification problem, to identify speech with binary (OSD) or three-class label (joint VAD and OSD) at frame level. In contrast to the mainstream, this study investigates the joint VAD and OSD task from a new perspective. In particular, we propose to extend traditional classification network with multi-exit architecture. Such an architecture empowers our system with unique capability to identify class using either low-level features from early exits or high-level features from last exit. In addition, two training schemes, knowledge distillation and dense connection, are adopted to further boost our system performance. Experimental results on benchmark datasets (AMI and DIHARD-III) validated the effectiveness and generality of our proposed system. Our ablations further reveal the complementary contribution of proposed schemes. With F_1 score of 0.792 on AMI and 0.625 on DIHARD-III, our proposed system outperforms several top performing models on these datasets, but also surpasses the current state-of-the-art by large margins across both datasets. Besides the performance benefit, our proposed system offers another appealing potential for quality-complexity trade-offs, which is highly preferred for efficient OSD deployment.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/22/2021

Statistical Analysis of Perspective Scores on Hate Speech Detection

Hate speech detection has become a hot topic in recent years due to the ...
research
07/24/2023

Joint speech and overlap detection: a benchmark over multiple audio setup and speech domains

Voice activity and overlapped speech detection (respectively VAD and OSD...
research
06/03/2023

Efficient Multi-Grained Knowledge Reuse for Class Incremental Segmentation

Class Incremental Semantic Segmentation (CISS) has been a trend recently...
research
04/07/2021

Three-class Overlapped Speech Detection using a Convolutional Recurrent Neural Network

In this work, we propose an overlapped speech detection system trained a...
research
08/28/2018

All You Need is "Love": Evading Hate-speech Detection

With the spread of social networks and their unfortunate use for hate sp...
research
10/28/2022

SG-VAD: Stochastic Gates Based Speech Activity Detection

We propose a novel voice activity detection (VAD) model in a low-resourc...
research
09/15/2023

One-Class Knowledge Distillation for Spoofing Speech Detection

The detection of spoofing speech generated by unseen algorithms remains ...

Please sign up or login with your details

Forgot password? Click here to reset