Three-class Overlapped Speech Detection using a Convolutional Recurrent Neural Network

04/07/2021
by   Jee-weon Jung, et al.
0

In this work, we propose an overlapped speech detection system trained as a three-class classifier. Unlike conventional systems that perform binary classification as to whether or not a frame contains overlapped speech, the proposed approach classifies into three classes: non-speech, single speaker speech, and overlapped speech. By training a network with the more detailed label definition, the model can learn a better notion on deciding the number of speakers included in a given frame. A convolutional recurrent neural network architecture is explored to benefit from both convolutional layer's capability to model local patterns and recurrent layer's ability to model sequential information. The proposed overlapped speech detection model establishes a state-of-the-art performance with a precision of 0.6648 and a recall of 0.3222 on the DIHARD II evaluation set, showing a 20 higher precision. In addition, we also introduce a simple approach to utilize the proposed overlapped speech detection model for speaker diarization which ranked third place in the Track 1 of the DIHARD III challenge.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/15/2021

Speech frame implementation for speech analysis and recognition

Distinctive features of the created speech frame are: the ability to tak...
research
09/26/2020

Abusive Language Detection and Characterization of Twitter Behavior

In this work, abusive language detection in online content is performed ...
research
06/10/2020

Speaker Diarization: Using Recurrent Neural Networks

Speaker Diarization is the problem of separating speakers in an audio. T...
research
02/11/2020

Phoneme Boundary Detection using Learnable Segmental Features

Phoneme boundary detection plays an essential first step for a variety o...
research
09/24/2022

Joint Speech Activity and Overlap Detection with Multi-Exit Architecture

Overlapped speech detection (OSD) is critical for speech applications in...
research
06/04/2021

Classification of Audio Segments in Call Center Recordings using Convolutional Recurrent Neural Networks

Detailed statistical analysis of call center recordings is critical in t...
research
06/10/2020

Uniphore's submission to Fearless Steps Challenge Phase-2

We propose supervised systems for speech activity detection (SAD) and sp...

Please sign up or login with your details

Forgot password? Click here to reset