TED: Triple Supervision Decouples End-to-end Speech-to-text Translation

09/21/2020
by   Qianqian Dong, et al.
0

An end-to-end speech-to-text translation (ST) takes audio in a source language and outputs the text in a target language. Inspired by neuroscience, humans have perception systems and cognitive systems to process different information, we propose TED, Transducer-Encoder-Decoder, a unified framework with triple supervision to decouple the end-to-end speech-to-text translation task. In addition to the target sentence translation loss, includes two auxiliary supervising signals to guide the acoustic transducer that extracts acoustic features from the input, and the semantic encoder to extract semantic features relevant to the source transcription text. Our method achieves state-of-the-art performance on both English-French and English-German speech translation benchmarks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/28/2020

Bridging the Modality Gap for Speech-to-Text Translation

End-to-end speech translation aims to translate speech in one language i...
research
09/14/2023

Direct Text to Speech Translation System using Acoustic Units

This paper proposes a direct text to speech translation system using dis...
research
04/04/2022

Into-TTS : Intonation Template based Prosody Control System

Intonations take an important role in delivering the intention of the sp...
research
07/26/2023

Exploring the Interactions between Target Positive and Negative Information for Acoustic Echo Cancellation

Acoustic echo cancellation (AEC) aims to remove interference signals whi...
research
10/13/2016

A Survey of Voice Translation Methodologies - Acoustic Dialect Decoder

Speech Translation has always been about giving source text or audio inp...
research
11/04/2018

Investigating context features hidden in End-to-End TTS

Recent studies have introduced end-to-end TTS, which integrates the prod...
research
06/09/2022

Revisiting End-to-End Speech-to-Text Translation From Scratch

End-to-end (E2E) speech-to-text translation (ST) often depends on pretra...

Please sign up or login with your details

Forgot password? Click here to reset