Transformer based unsupervised pre-training for acoustic representation learning

07/29/2020
by   Ruixiong Zhang, et al.
0

Computational audio analysis has become a central issue in associated areas of research and a variety of related applications arised. However, for many acoustic tasks, the labeled data size may be limited. To handle this problem, We propose an unsupervised pre-training method using Transformer based encoder to learn a general and robust high-level representation for all acoustic tasks. Experiments have been conducted on three kinds of acoustic tasks: speech translation, speech emotion recognition and sound event detection. All the experiments have shown that pre-training using its own training data can significantly make the model converge faster and improve the performance. With a larger pre-training data combining MuST-C, Librispeech and ESC-US datasets, for speech translation, the BLEU score can further improve relatively 12.2 En-De dataset and 8.4 score can further improve absolutely 1.7 and 2.4 improve absolutely 4.3

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/22/2019

Cross-task pre-training for acoustic scene classification

Acoustic scene classification(ASC) and acoustic event detection(AED) are...
research
10/22/2019

Improving Transformer-based Speech Recognition Using Unsupervised Pre-training

Speech recognition technologies are gaining enormous popularity in vario...
research
10/22/2020

MAM: Masked Acoustic Modeling for End-to-End Speech-to-Text Translation

End-to-end Speech-to-text Translation (E2E- ST), which directly translat...
research
04/11/2019

wav2vec: Unsupervised Pre-training for Speech Recognition

We explore unsupervised pre-training for speech recognition by learning ...
research
10/25/2022

Improving Speech Representation Learning via Speech-level and Phoneme-level Masking Approach

Recovering the masked speech frames is widely applied in speech represen...
research
01/27/2023

Pre-training for Speech Translation: CTC Meets Optimal Transport

The gap between speech and text modalities is a major challenge in speec...
research
04/07/2022

Speech Pre-training with Acoustic Piece

Previous speech pre-training methods, such as wav2vec2.0 and HuBERT, pre...

Please sign up or login with your details

Forgot password? Click here to reset