Future-Guided Incremental Transformer for Simultaneous Translation

12/23/2020
by   Shaolei Zhang, et al.
0

Simultaneous translation (ST) starts translations synchronously while reading source sentences, and is used in many online scenarios. The previous wait-k policy is concise and achieved good results in ST. However, wait-k policy faces two weaknesses: low training speed caused by the recalculation of hidden states and lack of future source information to guide training. For the low training speed, we propose an incremental Transformer with an average embedding layer (AEL) to accelerate the speed of calculation of the hidden states during training. For future-guided training, we propose a conventional Transformer as the teacher of the incremental Transformer, and try to invisibly embed some future information in the model through knowledge distillation. We conducted experiments on Chinese-English and German-English simultaneous translation tasks and compared with the wait-k policy to evaluate the proposed method. Our method can effectively increase the training speed by about 28 times on average at different k and implicitly embed some predictive abilities in the model, achieving better translation quality than wait-k baseline.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/05/2019

Source Dependency-Aware Transformer with Supervised Self-Attention

Recently, Transformer has achieved the state-of-the-art performance on m...
research
06/12/2021

Guiding Teacher Forcing with Seer Forcing for Neural Machine Translation

Although teacher forcing has become the main training paradigm for neura...
research
07/03/2023

Shiftable Context: Addressing Training-Inference Context Mismatch in Simultaneous Speech Translation

Transformer models using segment-based processing have been an effective...
research
05/22/2023

Learning Optimal Policy for Simultaneous Machine Translation via Binary Search

Simultaneous machine translation (SiMT) starts to output translation whi...
research
03/21/2020

Analyzing Word Translation of Transformer Layers

The Transformer translation model is popular for its effective paralleli...
research
01/10/2018

Translating Pro-Drop Languages with Reconstruction Models

Pronouns are frequently omitted in pro-drop languages, such as Chinese, ...
research
07/19/2021

Simultaneous Speech Translation for Live Subtitling: from Delay to Display

With the increased audiovisualisation of communication, the need for liv...

Please sign up or login with your details

Forgot password? Click here to reset