Simultaneous Machine Translation with Visual Context

09/15/2020
by   Ozan Caglayan, et al.
0

Simultaneous machine translation (SiMT) aims to translate a continuous input text stream into another language with the lowest latency and highest quality possible. The translation thus has to start with an incomplete source text, which is read progressively, creating the need for anticipation. In this paper, we seek to understand whether the addition of visual information can compensate for the missing source context. To this end, we analyse the impact of different multimodal approaches and visual features on state-of-the-art SiMT frameworks. Our results show that visual context is helpful and that visually-grounded models based on explicit object region information are much better than commonly used global features, reaching up to 3 BLEU points improvement under low latency scenarios. Our qualitative analysis illustrates cases where only the multimodal systems are able to translate correctly from English into gender-marked languages, as well as deal with differences in word order, such as adjective-noun placement between English and French.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/22/2021

Exploiting Multimodal Reinforcement Learning for Simultaneous Machine Translation

This paper addresses the problem of simultaneous machine translation (Si...
research
10/26/2021

Simultaneous Neural Machine Translation with Constituent Label Prediction

Simultaneous translation is a task in which translation begins before th...
research
01/23/2022

Supervised Visual Attention for Simultaneous Multimodal Machine Translation

Recently, there has been a surge in research in multimodal machine trans...
research
03/20/2019

Probing the Need for Visual Context in Multimodal Machine Translation

Current work on multimodal machine translation (MMT) has suggested that ...
research
09/04/2020

Dynamic Context-guided Capsule Network for Multimodal Machine Translation

Multimodal machine translation (MMT), which mainly focuses on enhancing ...
research
07/04/2017

Visually Grounded Word Embeddings and Richer Visual Features for Improving Multimodal Neural Machine Translation

In Multimodal Neural Machine Translation (MNMT), a neural model generate...
research
06/18/2019

Distilling Translations with Visual Awareness

Previous work on multimodal machine translation has shown that visual in...

Please sign up or login with your details

Forgot password? Click here to reset