Learning Noise-Invariant Representations for Robust Speech Recognition

07/17/2018
by   Davis Liang, et al.
0

Despite rapid advances in speech recognition, current models remain brittle to superficial perturbations to their inputs. Small amounts of noise can destroy the performance of an otherwise state-of-the-art model. To harden models against background noise, practitioners often perform data augmentation, adding artificially-noised examples to the training set, carrying over the original label. In this paper, we hypothesize that a clean example and its superficially perturbed counterparts shouldn't merely map to the same class --- they should map to the same representation. We propose invariant-representation-learning (IRL): At each training iteration, for each training example,we sample a noisy counterpart. We then apply a penalty term to coerce matched representations at each layer (above some chosen layer). Our key results, demonstrated on the Librispeech dataset are the following: (i) IRL significantly reduces character error rates (CER) on both 'clean' (3.3 6.5 noise settings (different from those seen during training), IRL's benefits are even more pronounced. Careful ablations confirm that our results are not simply due to shrinking activations at the chosen layers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/22/2022

A Noise-Robust Self-supervised Pre-training Model Based Speech Representation Learning for Automatic Speech Recognition

Wav2vec2.0 is a popular self-supervised pre-training framework for learn...
research
05/03/2022

On monoaural speech enhancement for automatic recognition of real noisy speech using mixture invariant training

In this paper, we explore an improved framework to train a monoaural neu...
research
07/02/2021

Supervised Contrastive Learning for Accented Speech Recognition

Neural network based speech recognition systems suffer from performance ...
research
11/02/2020

SapAugment: Learning A Sample Adaptive Policy for Data Augmentation

Data augmentation methods usually apply the same augmentation (or a mix ...
research
08/23/2023

KinSPEAK: Improving speech recognition for Kinyarwanda via semi-supervised learning methods

Despite recent availability of large transcribed Kinyarwanda speech data...
research
07/24/2023

Robust Automatic Speech Recognition via WavAugment Guided Phoneme Adversarial Training

Developing a practically-robust automatic speech recognition (ASR) is ch...
research
05/03/2021

Quantifying and Maximizing the Benefits of Back-End Noise Adaption on Attention-Based Speech Recognition Models

This work analyzes how attention-based Bidirectional Long Short-Term Mem...

Please sign up or login with your details

Forgot password? Click here to reset