AlignAtt: Using Attention-based Audio-Translation Alignments as a Guide for Simultaneous Speech Translation

05/19/2023
by   Sara Papi, et al.
15

Attention is the core mechanism of today's most used architectures for natural language processing and has been analyzed from many perspectives, including its effectiveness for machine translation-related tasks. Among these studies, attention resulted to be a useful source of information to get insights about word alignment also when the input text is substituted with audio segments, as in the case of the speech translation (ST) task. In this paper, we propose AlignAtt, a novel policy for simultaneous ST (SimulST) that exploits the attention information to generate source-target alignments that guide the model during inference. Through experiments on the 8 language pairs of MuST-C v1.0, we show that AlignAtt outperforms previous state-of-the-art SimulST policies applied to offline-trained models with gains in terms of BLEU of 2 points and latency reductions ranging from 0.5s to 0.8s across the 8 languages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/15/2022

Attention as a guide for Simultaneous Speech Translation

The study of the attention mechanism has sparked interest in many fields...
research
07/01/2021

The USTC-NELSLIP Systems for Simultaneous Speech Translation Task at IWSLT 2021

This paper describes USTC-NELSLIP's submissions to the IWSLT2021 Simulta...
research
03/17/2022

Gaussian Multi-head Attention for Simultaneous Machine Translation

Simultaneous machine translation (SiMT) outputs translation while receiv...
research
07/01/2021

ESPnet-ST IWSLT 2021 Offline Speech Translation System

This paper describes the ESPnet-ST group's IWSLT 2021 submission in the ...
research
09/26/2019

Monotonic Multihead Attention

Simultaneous machine translation models start generating a target sequen...
research
09/13/2023

Simultaneous Machine Translation with Large Language Models

Large language models (LLM) have demonstrated their abilities to solve v...
research
09/09/2021

Speechformer: Reducing Information Loss in Direct Speech Translation

Transformer-based models have gained increasing popularity achieving sta...

Please sign up or login with your details

Forgot password? Click here to reset