AIPNet: Generative Adversarial Pre-training of Accent-invariant Networks for End-to-end Speech Recognition

11/27/2019
by   Yi-Chen Chen, et al.
0

As one of the major sources in speech variability, accents have posed a grand challenge to the robustness of speech recognition systems. In this paper, our goal is to build a unified end-to-end speech recognition system that generalizes well across accents. For this purpose, we propose a novel pre-training framework AIPNet based on generative adversarial nets (GAN) for accent-invariant representation learning: Accent Invariant Pre-training Networks. We pre-train AIPNet to disentangle accent-invariant and accent-specific characteristics from acoustic features through adversarial training on accented data for which transcriptions are not necessarily available. We further fine-tune AIPNet by connecting the accent-invariant module with an attention-based encoder-decoder model for multi-accent speech recognition. In the experiments, our approach is compared against four baselines including both accent-dependent and accent-independent models. Experimental results on 9 English accents show that the proposed approach outperforms all the baselines by 2.3 ∼ 4.5 WER when transcriptions are available in all accents and by 1.6 ∼ 6.1 relative reduction when transcriptions are only available in US accent.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/29/2022

MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech Recognition

In this paper, we propose a novel multi-modal multi-task encoder-decoder...
research
03/31/2022

Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data

This paper studies a novel pre-training technique with unpaired speech d...
research
10/28/2019

Unsupervised pre-training for sequence to sequence speech recognition

This paper proposes a novel approach to pre-train encoder-decoder sequen...
research
07/07/2019

NIESR: Nuisance Invariant End-to-end Speech Recognition

Deep neural network models for speech recognition have achieved great su...
research
11/05/2017

Robust Speech Recognition Using Generative Adversarial Networks

This paper describes a general, scalable, end-to-end framework that uses...
research
05/05/2020

End-to-end Whispered Speech Recognition with Frequency-weighted Approaches and Layer-wise Transfer Learning

Whispering is an important mode of human speech, but no end-to-end recog...
research
10/22/2019

Improving Transformer-based Speech Recognition Using Unsupervised Pre-training

Speech recognition technologies are gaining enormous popularity in vario...

Please sign up or login with your details

Forgot password? Click here to reset