CTC Alignments Improve Autoregressive Translation

10/11/2022
by   Brian Yan, et al.
0

Connectionist Temporal Classification (CTC) is a widely used approach for automatic speech recognition (ASR) that performs conditionally independent monotonic alignment. However for translation, CTC exhibits clear limitations due to the contextual and non-monotonic nature of the task and thus lags behind attentional decoder approaches in terms of translation quality. In this work, we argue that CTC does in fact make sense for translation if applied in a joint CTC/attention framework wherein CTC's core properties can counteract several key weaknesses of pure-attention models during training and decoding. To validate this conjecture, we modify the Hybrid CTC/Attention model originally proposed for ASR to support text-to-text translation (MT) and speech-to-text translation (ST). Our proposed joint CTC/attention models outperform pure-attention baselines across six benchmark translation tasks.

READ FULL TEXT

page 4

page 7

page 14

page 16

research
07/13/2021

The IWSLT 2021 BUT Speech Translation Systems

The paper describes BUT's English to German offline speech translation(S...
research
05/04/2023

Hybrid Transducer and Attention based Encoder-Decoder Modeling for Speech-to-Text Tasks

Transducer and Attention based Encoder-Decoder (AED) are two widely used...
research
03/26/2021

Mutually-Constrained Monotonic Multihead Attention for Online ASR

Despite the feature of real-time decoding, Monotonic Multihead Attention...
research
04/11/2022

Large-Scale Streaming End-to-End Speech Translation with Neural Transducers

Neural transducers have been widely used in automatic speech recognition...
research
10/21/2020

A General Multi-Task Learning Framework to Leverage Text Data for Speech to Text Tasks

Attention-based sequence-to-sequence modeling provides a powerful and el...
research
11/11/2022

Align, Write, Re-order: Explainable End-to-End Speech Translation via Operation Sequence Generation

The black-box nature of end-to-end speech translation (E2E ST) systems m...
research
08/19/2023

Bayes Risk Transducer: Transducer with Controllable Alignment Prediction

Automatic speech recognition (ASR) based on transducers is widely used. ...

Please sign up or login with your details

Forgot password? Click here to reset