An End-to-end Architecture of Online Multi-channel Speech Separation

09/07/2020
by   Jian Wu, et al.
0

Multi-speaker speech recognition has been one of the keychallenges in conversation transcription as it breaks the singleactive speaker assumption employed by most state-of-the-artspeech recognition systems. Speech separation is consideredas a remedy to this problem. Previously, we introduced a sys-tem, calledunmixing,fixed-beamformerandextraction(UFE),that was shown to be effective in addressing the speech over-lap problem in conversation transcription. With UFE, an inputmixed signal is processed by fixed beamformers, followed by aneural network post filtering. Although promising results wereobtained, the system contains multiple individually developedmodules, leading potentially sub-optimum performance. In thiswork, we introduce an end-to-end modeling version of UFE. Toenable gradient propagation all the way, an attentional selectionmodule is proposed, where an attentional weight is learnt foreach beamformer and spatial feature sampled over space. Ex-perimental results show that the proposed system achieves com-parable performance in an offline evaluation with the originalseparate processing-based pipeline, while producing remark-able improvements in an online evaluation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/18/2019

End-to-end training of time domain audio separation and recognition

The rising interest in single-channel multi-speaker speech separation sp...
research
12/18/2019

Ene-to-end training of time domain audio separation and recognition

The rising interest in single-channel multi-speaker speech separation sp...
research
02/23/2021

End-to-End Dereverberation, Beamforming, and Speech Recognition with Improved Numerical Stability and Advanced Frontend

Recently, the end-to-end approach has been successfully applied to multi...
research
07/05/2021

Investigation of Practical Aspects of Single Channel Speech Separation for ASR

Speech separation has been successfully applied as a frontend processing...
research
04/08/2021

AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario

In this paper, we present AISHELL-4, a sizable real-recorded Mandarin sp...
research
10/12/2021

VarArray: Array-Geometry-Agnostic Continuous Speech Separation

Continuous speech separation using a microphone array was shown to be pr...
research
06/15/2022

On the Design and Training Strategies for RNN-based Online Neural Speech Separation Systems

While the performance of offline neural speech separation systems has be...

Please sign up or login with your details

Forgot password? Click here to reset