Mask scalar prediction for improving robust automatic speech recognition

04/26/2022
by   Arun Narayanan, et al.
0

Using neural network based acoustic frontends for improving robustness of streaming automatic speech recognition (ASR) systems is challenging because of the causality constraints and the resulting distortion that the frontend processing introduces in speech. Time-frequency masking based approaches have been shown to work well, but they need additional hyper-parameters to scale the mask to limit speech distortion. Such mask scalars are typically hand-tuned and chosen conservatively. In this work, we present a technique to predict mask scalars using an ASR-based loss in an end-to-end fashion, with minimal increase in the overall model size and complexity. We evaluate the approach on two robust ASR tasks: multichannel enhancement in the presence of speech and non-speech noise, and acoustic echo cancellation (AEC). Results show that the presented algorithm consistently improves word error rate (WER) without the need for any additional tuning over strong baselines that use hand-tuned hyper-parameters: up to 16 and up to 7

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/25/2022

Cleanformer: A microphone array configuration-invariant, streaming, multichannel neural enhancement frontend for ASR

This work introduces the Cleanformer, a streaming multichannel neural ba...
research
02/06/2020

Robust Multi-channel Speech Recognition using Frequency Aligned Network

Conventional speech enhancement technique such as beamforming has known ...
research
11/01/2019

Predicting word error rate for reverberant speech

Reverberation negatively impacts the performance of automatic speech rec...
research
05/09/2022

Speaker Reinforcement Using Target Source Extraction for Robust Automatic Speech Recognition

Improving the accuracy of single-channel automatic speech recognition (A...
research
06/19/2018

Speaker Adapted Beamforming for Multi-Channel Automatic Speech Recognition

This paper presents, in the context of multi-channel ASR, a method to ad...
research
11/23/2021

Effect of noise suppression losses on speech distortion and ASR performance

Deep learning based speech enhancement has made rapid development toward...
research
06/13/2023

Statistical Beamformer Exploiting Non-stationarity and Sparsity with Spatially Constrained ICA for Robust Speech Recognition

In this paper, we present a statistical beamforming algorithm as a pre-p...

Please sign up or login with your details

Forgot password? Click here to reset