Real-time Speaker counting in a cocktail party scenario using Attention-guided Convolutional Neural Network

10/30/2021
by   Midia Yousefi, et al.
0

Most current speech technology systems are designed to operate well even in the presence of multiple active speakers. However, most solutions assume that the number of co-current speakers is known. Unfortunately, this information might not always be available in real-world applications. In this study, we propose a real-time, single-channel attention-guided Convolutional Neural Network (CNN) to estimate the number of active speakers in overlapping speech. The proposed system extracts higher-level information from the speech spectral content using a CNN model. Next, the attention mechanism summarizes the extracted information into a compact feature vector without losing critical information. Finally, the active speakers are classified using a fully connected network. Experiments on simulated overlapping speech using WSJ corpus show that the attention solution is shown to improve the performance by almost 3 Attention-guided CNN achieves 76.15 Recall, and 75.80 200 ms). All the classification metrics exceed 92 model in offline scenarios where the input signal is more than 100 frames long (i.e., 1s).

READ FULL TEXT
research
02/08/2021

Extracting the Locus of Attention at a Cocktail Party from Single-Trial EEG using a Joint CNN-LSTM Model

Human brain performs remarkably well in segregating a particular speaker...
research
07/20/2021

A Real-time Speaker Diarization System Based on Spatial Spectrum

In this paper we describe a speaker diarization system that enables loca...
research
12/19/2021

Multi-turn RNN-T for streaming recognition of multi-party speech

Automatic speech recognition (ASR) of single channel far-field recording...
research
08/03/2020

Speaker dependent articulatory-to-acoustic mapping using real-time MRI of the vocal tract

Articulatory-to-acoustic (forward) mapping is a technique to predict spe...
research
06/10/2020

Uniphore's submission to Fearless Steps Challenge Phase-2

We propose supervised systems for speech activity detection (SAD) and sp...
research
03/18/2022

Speaker Embedding-aware Neural Diarization: an Efficient Framework for Overlapping Speech Diarization in Meeting Scenarios

Overlapping speech diarization has been traditionally treated as a multi...

Please sign up or login with your details

Forgot password? Click here to reset