Frustratingly Easy Noise-aware Training of Acoustic Models

11/04/2020
by   Desh Raj, et al.
0

Environmental noises and reverberation have a detrimental effect on the performance of automatic speech recognition (ASR) systems. Multi-condition training of neural network-based acoustic models is used to deal with this problem, but it requires many-folds data augmentation, resulting in increased training time. In this paper, we propose utterance-level noise vectors for noise-aware training of acoustic models in hybrid ASR. Our noise vectors are obtained by combining the means of speech frames and silence frames in the utterance, where the speech/silence labels may be obtained from a GMM-HMM model trained for ASR alignments, such that no extra computation is required beyond averaging of feature vectors. We show through experiments on AMI and Aurora-4 that this simple adaptation technique can result in 6-7 improvement. We implement several embedding-based adaptation baselines proposed in literature, and show that our method outperforms them on both the datasets. Finally, we extend our method to the online ASR setting by using frame-level maximum likelihood for the mean estimation.

READ FULL TEXT
research
01/20/2021

The JHU ASR System for VOiCES from a Distance Challenge 2019

This paper describes the system developed by the JHU team for automatic ...
research
12/07/2020

Frame-level SpecAugment for Deep Convolutional Neural Networks in Hybrid ASR Systems

Inspired by SpecAugment – a data augmentation method for end-to-end ASR ...
research
03/01/2022

A Conformer Based Acoustic Model for Robust Automatic Speech Recognition

This study addresses robust automatic speech recognition (ASR) by introd...
research
02/08/2022

Enhancing ASR for Stuttered Speech with Limited Data Using Detect and Pass

It is estimated that around 70 million people worldwide are affected by ...
research
09/08/2022

Goodness of Pronunciation Pipelines for OOV Problem

In the following report we propose pipelines for Goodness of Pronunciati...
research
11/27/2016

Invariant Representations for Noisy Speech Recognition

Modern automatic speech recognition (ASR) systems need to be robust unde...
research
10/27/2017

Acoustic Landmarks Contain More Information About the Phone String than Other Frames

Most mainstream Automatic Speech Recognition (ASR) systems consider all ...

Please sign up or login with your details

Forgot password? Click here to reset