Attention as a guide for Simultaneous Speech Translation

12/15/2022
by   Sara Papi, et al.
14

The study of the attention mechanism has sparked interest in many fields, such as language modeling and machine translation. Although its patterns have been exploited to perform different tasks, from neural network understanding to textual alignment, no previous work has analysed the encoder-decoder attention behavior in speech translation (ST) nor used it to improve ST on a specific task. In this paper, we fill this gap by proposing an attention-based policy (EDAtt) for simultaneous ST (SimulST) that is motivated by an analysis of the existing attention relations between audio input and textual output. Its goal is to leverage the encoder-decoder attention scores to guide inference in real time. Results on en->de, es show that the EDAtt policy achieves overall better results compared to the SimulST state of the art, especially in terms of computational-aware latency.

READ FULL TEXT
research
01/13/2016

Implicit Distortion and Fertility Models for Attention-based Encoder-Decoder NMT Model

Neural machine translation has shown very promising results lately. Most...
research
05/19/2023

AlignAtt: Using Attention-based Audio-Translation Alignments as a Guide for Simultaneous Speech Translation

Attention is the core mechanism of today's most used architectures for n...
research
09/06/2016

Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling

Attention-based encoder-decoder neural network models have recently show...
research
04/25/2023

State Spaces Aren't Enough: Machine Translation Needs Attention

Structured State Spaces for Sequences (S4) is a recently proposed sequen...
research
09/26/2019

Monotonic Multihead Attention

Simultaneous machine translation models start generating a target sequen...
research
02/11/2018

Tree-to-tree Neural Networks for Program Translation

Program translation is an important tool to migrate legacy code in one l...
research
06/04/2020

Attention and Encoder-Decoder based models for transforming articulatory movements at different speaking rates

While speaking at different rates, articulators (like tongue, lips) tend...

Please sign up or login with your details

Forgot password? Click here to reset