Robust Automatic Speech Recognition via WavAugment Guided Phoneme Adversarial Training

07/24/2023
by   Gege Qi, et al.
0

Developing a practically-robust automatic speech recognition (ASR) is challenging since the model should not only maintain the original performance on clean samples, but also achieve consistent efficacy under small volume perturbations and large domain shifts. To address this problem, we propose a novel WavAugment Guided Phoneme Adversarial Training (wapat). wapat use adversarial examples in phoneme space as augmentation to make the model invariant to minor fluctuations in phoneme representation and preserve the performance on clean samples. In addition, wapat utilizes the phoneme representation of augmented samples to guide the generation of adversaries, which helps to find more stable and diverse gradient-directions, resulting in improved generalization. Extensive experiments demonstrate the effectiveness of wapat on End-to-end Speech Challenge Benchmark (ESB). Notably, SpeechLM-wapat outperforms the original model by 6.28 state-of-the-art.

READ FULL TEXT
research
07/21/2020

Audio Adversarial Examples for Robust Hybrid CTC/Attention Speech Recognition

Recent advances in Automatic Speech Recognition (ASR) demonstrated how e...
research
11/02/2018

Training Neural Speech Recognition Systems with Synthetic Speech Augmentation

Building an accurate automatic speech recognition (ASR) system requires ...
research
11/02/2018

Adversarial Training of End-to-end Speech Recognition Using a Criticizing Language Model

In this paper we proposed a novel Adversarial Training (AT) approach for...
research
03/31/2020

Characterizing Speech Adversarial Examples Using Self-Attention U-Net Enhancement

Recent studies have highlighted adversarial examples as ubiquitous threa...
research
11/27/2016

Invariant Representations for Noisy Speech Recognition

Modern automatic speech recognition (ASR) systems need to be robust unde...
research
01/10/2020

Improving Dysarthric Speech Intelligibility Using Cycle-consistent Adversarial Training

Dysarthria is a motor speech impairment affecting millions of people. Dy...
research
07/17/2018

Learning Noise-Invariant Representations for Robust Speech Recognition

Despite rapid advances in speech recognition, current models remain brit...

Please sign up or login with your details

Forgot password? Click here to reset