Large-Scale Learning on Overlapped Speech Detection: New Benchmark and New General System

08/11/2023
by   Zhaohui Yin, et al.
0

Overlapped Speech Detection (OSD) is an important part of speech applications involving analysis of multi-party conversations. However, most of existing OSD systems are trained and evaluated on small datasets with limited application domains, which led to the robustness of them lacks benchmark for evaluation and the accuracy of them remains inadequate in realistic acoustic environments. To solve these problem, we conduct a study of large-scale learning (LSL) in OSD tasks and propose a new general OSD system named CF-OSD with LSL based on Conformer network and LSL. In our study, a large-scale test set consisting of 151h labeled speech of different styles, languages and sound-source distances is produced and used as a new benchmark for evaluating the generality of OSD systems. Rigorous comparative experiments are designed and used to evaluate the effectiveness of LSL in OSD tasks and define the OSD model of our general OSD system. The experiment results show that LSL can significantly improve the accuracy and robustness of OSD systems, and the CF-OSD with LSL system significantly outperforms other OSD systems on our proposed benchmark. Moreover, our system has also achieved state-of-the-art performance on existing small dataset benchmarks, reaching 81.6% and 53.8% in the Alimeeting testset and DIHARD II evaluation set, respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/08/2021

Audio-Visual Synchronisation in the wild

In this paper, we consider the problem of audio-visual synchronisation a...
research
03/02/2022

Large-Scale Hate Speech Detection with Cross-Domain Transfer

The performance of hate speech detection models relies on the datasets o...
research
12/29/2020

RADDLE: An Evaluation Benchmark and Analysis Platform for Robust Task-oriented Dialog Systems

For task-oriented dialog systems to be maximally useful, it must be able...
research
04/03/2023

LAHM : Large Annotated Dataset for Multi-Domain and Multilingual Hate Speech Identification

Current research on hate speech analysis is typically oriented towards m...
research
01/13/2022

LARD: Large-scale Artificial Disfluency Generation

Disfluency detection is a critical task in real-time dialogue systems. H...
research
02/24/2021

SEP-28k: A Dataset for Stuttering Event Detection From Podcasts With People Who Stutter

The ability to automatically detect stuttering events in speech could he...
research
03/10/2020

Large-Scale Evaluation of Keyphrase Extraction Models

Keyphrase extraction models are usually evaluated under different, not d...

Please sign up or login with your details

Forgot password? Click here to reset