Shiftable Context: Addressing Training-Inference Context Mismatch in Simultaneous Speech Translation

07/03/2023
by   Matthew Raffel, et al.
0

Transformer models using segment-based processing have been an effective architecture for simultaneous speech translation. However, such models create a context mismatch between training and inference environments, hindering potential translation accuracy. We solve this issue by proposing Shiftable Context, a simple yet effective scheme to ensure that consistent segment and context sizes are maintained throughout training and inference, even with the presence of partially filled segments due to the streaming nature of simultaneous translation. Shiftable Context is also broadly applicable to segment-based transformers for streaming tasks. Our experiments on the English-German, English-French, and English-Spanish language pairs from the MUST-C dataset demonstrate that when applied to the Augmented Memory Transformer, a state-of-the-art model for simultaneous speech translation, the proposed scheme achieves an average increase of 2.09, 1.83, and 1.95 BLEU scores across each wait-k value for the three language pairs, respectively, with a minimal impact on computation-aware Average Lagging.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/03/2023

Implicit Memory Transformer for Computationally Efficient Simultaneous Speech Translation

Simultaneous speech translation is an essential communication task diffi...
research
10/30/2020

Streaming Simultaneous Speech Translation with Augmented Memory Transformer

Transformer-based models have achieved state-of-the-art performance on s...
research
04/19/2022

Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation

Although Transformers have gained success in several speech processing t...
research
12/23/2020

Future-Guided Incremental Transformer for Simultaneous Translation

Simultaneous translation (ST) starts translations synchronously while re...
research
10/16/2022

RedApt: An Adaptor for wav2vec 2 Encoding Faster and Smaller Speech Translation without Quality Compromise

Pre-trained speech Transformers in speech translation (ST) have facilita...
research
03/14/2023

Adapting Offline Speech Translation Models for Streaming with Future-Aware Distillation and Inference

A popular approach to streaming speech translation is to employ a single...
research
10/28/2022

Efficient Speech Translation with Dynamic Latent Perceivers

Transformers have been the dominant architecture for Speech Translation ...

Please sign up or login with your details

Forgot password? Click here to reset