Unsupervised pre-traing for sequence to sequence speech recognition

10/28/2019
by   Zhiyun Fan, et al.
0

This paper proposes a novel approach to pre-train encoder-decoder sequence-to-sequence (seq2seq) model with unpaired speech and transcripts respectively. Our pre-training method is divided into two stages, named acoustic pre-trianing and linguistic pre-training. In the acoustic pre-training stage, we use a large amount of speech to pre-train the encoder by predicting masked speech feature chunks with its context. In the linguistic pre-training stage, we generate synthesized speech from a large number of transcripts using a single-speaker text to speech (TTS) system, and use the synthesized paired data to pre-train decoder. This two-stage pre-training method integrates rich acoustic and linguistic knowledge into seq2seq model, which will benefit downstream automatic speech recognition (ASR) tasks. The unsupervised pre-training is finished on AISHELL-2 dataset and we apply the pre-trained model to multiple paired data ratios of AISHELL-1 and HKUST. We obtain relative character error rate reduction (CERR) from 38.24 from 12.00 cross-lingual case with CALLHOME dataset. For all six languages in CALLHOME dataset, our pre-training method makes model outperform baseline consistently.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/28/2019

Unsupervised pre-training for sequence to sequence speech recognition

This paper proposes a novel approach to pre-train encoder-decoder sequen...
research
03/31/2022

Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data

This paper studies a novel pre-training technique with unpaired speech d...
research
12/07/2022

Improved Speech Pre-Training with Supervision-Enhanced Acoustic Unit

Speech pre-training has shown great success in learning useful and gener...
research
04/07/2022

Speech Pre-training with Acoustic Piece

Previous speech pre-training methods, such as wav2vec2.0 and HuBERT, pre...
research
12/09/2020

SongMASS: Automatic Song Writing with Pre-training and Alignment Constraint

Automatic song writing aims to compose a song (lyric and/or melody) by m...
research
08/11/2020

Unsupervised Learning For Sequence-to-sequence Text-to-speech For Low-resource Languages

Recently, sequence-to-sequence models with attention have been successfu...
research
04/11/2019

wav2vec: Unsupervised Pre-training for Speech Recognition

We explore unsupervised pre-training for speech recognition by learning ...

Please sign up or login with your details

Forgot password? Click here to reset