Bring the Noise: Introducing Noise Robustness to Pretrained Automatic Speech Recognition

09/05/2023
by   Patrick Eickhoff, et al.
0

In recent research, in the domain of speech processing, large End-to-End (E2E) systems for Automatic Speech Recognition (ASR) have reported state-of-the-art performance on various benchmarks. These systems intrinsically learn how to handle and remove noise conditions from speech. Previous research has shown, that it is possible to extract the denoising capabilities of these models into a preprocessor network, which can be used as a frontend for downstream ASR models. However, the proposed methods were limited to specific fully convolutional architectures. In this work, we propose a novel method to extract the denoising capabilities, that can be applied to any encoder-decoder architecture. We propose the Cleancoder preprocessor architecture that extracts hidden activations from the Conformer ASR model and feeds them to a decoder to predict denoised spectrograms. We train our pre-processor on the Noisy Speech Database (NSD) to reconstruct denoised spectrograms from noisy inputs. Then, we evaluate our model as a frontend to a pretrained Conformer ASR model as well as a frontend to train smaller Conformer ASR models from scratch. We show that the Cleancoder is able to filter noise from speech and that it improves the total Word Error Rate (WER) of the downstream model in noisy conditions for both applications.

READ FULL TEXT
research
07/04/2023

Boosting Norwegian Automatic Speech Recognition

In this paper, we present several baselines for automatic speech recogni...
research
07/26/2019

Correlation Distance Skip Connection Denoising Autoencoder (CDSK-DAE) for Speech Feature Enhancement

Performance of learning based Automatic Speech Recognition (ASR) is susc...
research
03/22/2020

Training for Speech Recognition on Coprocessors

Automatic Speech Recognition (ASR) has increased in popularity in recent...
research
04/12/2021

Comparing the Benefit of Synthetic Training Data for Various Automatic Speech Recognition Architectures

Recent publications on automatic-speech-recognition (ASR) have a strong ...
research
09/14/2016

An Adaptive Psychoacoustic Model for Automatic Speech Recognition

Compared with automatic speech recognition (ASR), the human auditory sys...
research
10/16/2019

Lead2Gold: Towards exploiting the full potential of noisy transcriptions for speech recognition

The transcriptions used to train an Automatic Speech Recognition (ASR) s...

Please sign up or login with your details

Forgot password? Click here to reset