Investigation of Practical Aspects of Single Channel Speech Separation for ASR

07/05/2021
by   Jian Wu, et al.
0

Speech separation has been successfully applied as a frontend processing module of conversation transcription systems thanks to its ability to handle overlapped speech and its flexibility to combine with downstream tasks such as automatic speech recognition (ASR). However, a speech separation model often introduces target speech distortion, resulting in a sub-optimum word error rate (WER). In this paper, we describe our efforts to improve the performance of a single channel speech separation system. Specifically, we investigate a two-stage training scheme that firstly applies a feature level optimization criterion for pretraining, followed by an ASR-oriented optimization criterion using an end-to-end (E2E) speech recognition model. Meanwhile, to keep the model light-weight, we introduce a modified teacher-student learning technique for model compression. By combining those approaches, we achieve a absolute average WER improvement of 2.70 parameters compared with the previous state-of-the-art results on the LibriCSS dataset for utterance-wise evaluation and continuous evaluation, respectively

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/22/2021

Multi-Channel Multi-Speaker ASR Using 3D Spatial Feature

Automatic speech recognition (ASR) of multi-channel multi-speaker overla...
research
07/19/2017

Single-Channel Multi-talker Speech Recognition with Permutation Invariant Training

Although great progresses have been made in automatic speech recognition...
research
07/21/2017

Progressive Joint Modeling in Unsupervised Single-channel Overlapped Speech Recognition

Unsupervised single-channel overlapped speech recognition is one of the ...
research
04/27/2022

Ultra Fast Speech Separation Model with Teacher Student Learning

Transformer has been successfully applied to speech separation recently ...
research
09/07/2020

An End-to-end Architecture of Online Multi-channel Speech Separation

Multi-speaker speech recognition has been one of the keychallenges in co...
research
11/11/2022

Breaking trade-offs in speech separation with sparsely-gated mixture of experts

Several trade-offs need to be balanced when employing monaural speech se...
research
10/22/2018

Investigation of Monaural Front-End Processing for Robust ASR without Retraining or Joint-Training

In recent years, monaural speech separation has been formulated as a sup...

Please sign up or login with your details

Forgot password? Click here to reset