DeepAI
Log In Sign Up

A Comparative Study on End-to-end Speech to Text Translation

11/20/2019
by   Parnia Bahar, et al.
0

Recent advances in deep learning show that end-to-end speech to text translation model is a promising approach to direct the speech translation field. In this work, we provide an overview of different end-to-end architectures, as well as the usage of an auxiliary connectionist temporal classification (CTC) loss for better convergence. We also investigate on pre-training variants such as initializing different components of a model using pre-trained models, and their impact on the final performance, which gives boosts up to 4 270h IWSLT TED-talks En->De, and 100h LibriSpeech Audiobooks En->Fr. We also show improvements over the current end-to-end state-of-the-art systems on both tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

11/09/2022

Efficient Speech Translation with Pre-trained Models

When building state-of-the-art speech translation models, the need for l...
10/31/2022

Textless Direct Speech-to-Speech Translation with Discrete Speech Representation

Research on speech-to-speech translation (S2ST) has progressed rapidly i...
06/10/2022

Distributionally Robust End-to-End Portfolio Construction

We propose an end-to-end distributionally robust system for portfolio co...
10/18/2022

Simple and Effective Unsupervised Speech Translation

The amount of labeled data to train models for speech tasks is limited f...
11/20/2019

On Using SpecAugment for End-to-End Speech Translation

This work investigates a simple data augmentation technique, SpecAugment...
05/11/2021

Investigating the Reordering Capability in CTC-based Non-Autoregressive End-to-End Speech Translation

We study the possibilities of building a non-autoregressive speech-to-te...
05/11/2017

Reducing Bias in Production Speech Models

Replacing hand-engineered pipelines with end-to-end deep learning system...