Does Speech enhancement of publicly available data help build robust Speech Recognition Systems?

10/29/2019
by   Bhavya Ghai, et al.
0

Automatic speech recognition (ASR) systems play a key role in many commercial products including voice assistants. Typically, they require large amounts of clean speech data for training which gives an undue advantage to large organizations which have tons of private data. In this paper, we have first curated a fairly big dataset using publicly available data sources. Thereafter, we tried to investigate if we can use publicly available noisy data to train robust ASR systems. We have used speech enhancement to clean the noisy data first and then used it together with its cleaned version to train ASR systems. We have found that using speech enhancement gives 9.5% better word error rate than training on just noisy data and 9% better than training on just clean data. It's performance is also comparable to the ideal case scenario when trained on noisy and its clean version.

READ FULL TEXT

page 1

page 2

page 3

research
10/24/2022

Time-Domain Speech Enhancement for Robust Automatic Speech Recognition

It has been shown that the intelligibility of noisy speech can be improv...
research
05/26/2021

Training Speech Enhancement Systems with Noisy Speech Datasets

Recently, deep neural network (DNN)-based speech enhancement (SE) system...
research
03/26/2018

Spectral feature mapping with mimic loss for robust speech recognition

For the task of speech enhancement, local learning objectives are agnost...
research
11/15/2017

Exploring Speech Enhancement with Generative Adversarial Networks for Robust Speech Recognition

We investigate the effectiveness of generative adversarial networks (GAN...
research
11/28/2018

Acoustics-guided evaluation (AGE): a new measure for estimating performance of speech enhancement algorithms for robust ASR

One challenging problem of robust automatic speech recognition (ASR) is ...
research
05/08/2023

Augmented Datasheets for Speech Datasets and Ethical Decision-Making

Speech datasets are crucial for training Speech Language Technologies (S...
research
09/14/2022

A Universally-Deployable ASR Frontend for Joint Acoustic Echo Cancellation, Speech Enhancement, and Voice Separation

Recent work has shown that it is possible to train a single model to per...

Please sign up or login with your details

Forgot password? Click here to reset