DeepAI
Log In Sign Up

On Using SpecAugment for End-to-End Speech Translation

11/20/2019
by   Parnia Bahar, et al.
0

This work investigates a simple data augmentation technique, SpecAugment, for end-to-end speech translation. SpecAugment is a low-cost implementation method applied directly to the audio input features and it consists of masking blocks of frequency channels, and/or time steps. We apply SpecAugment on end-to-end speech translation tasks and achieve up to +2.2% on LibriSpeech Audiobooks En->Fr and +1.2 overfitting to some extent. We also examine the effectiveness of the method in a variety of data scenarios and show that the method also leads to significant improvements in various data conditions irrespective of the amount of training data.

READ FULL TEXT
04/18/2019

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

We present SpecAugment, a simple data augmentation method for speech rec...
04/10/2020

Joint translation and unit conversion for end-to-end localization

A variety of natural language tasks require processing of textual data w...
03/16/2022

Sample, Translate, Recombine: Leveraging Audio Alignments for Data Augmentation in End-to-end Speech Translation

End-to-end speech translation relies on data that pair source-language s...
11/20/2019

A Comparative Study on End-to-end Speech to Text Translation

Recent advances in deep learning show that end-to-end speech to text tra...
11/09/2022

Efficient Speech Translation with Pre-trained Models

When building state-of-the-art speech translation models, the need for l...
06/04/2019

Exploring Phoneme-Level Speech Representations for End-to-End Speech Translation

Previous work on end-to-end translation from speech has primarily used f...
10/23/2019

Instance-Based Model Adaptation For Direct Speech Translation

Despite recent technology advancements, the effectiveness of neural appr...