Probing the Need for Visual Context in Multimodal Machine Translation

03/20/2019
by   Ozan Caglayan, et al.
0

Current work on multimodal machine translation (MMT) has suggested that the visual modality is either unnecessary or only marginally beneficial. We posit that this is a consequence of the very simple, short and repetitive sentences used in the only available dataset for the task (Multi30K), rendering the source text sufficient as context. In the general case, however, we believe that it is possible to combine visual and textual information in order to ground translations. In this paper we probe the contribution of the visual modality to state-of-the-art MMT models by conducting a systematic analysis where we partially deprive the models from source-side textual context. Our results show that under limited textual context, models are capable of leveraging the visual input to generate better translations. This contradicts the current belief that MMT models disregard the visual modality because of either the quality of the image features or the way they are integrated into the model.

READ FULL TEXT

page 4

page 5

page 8

page 9

page 10

page 11

page 12

research
06/01/2021

ViTA: Visual-Linguistic Translation by Aligning Object Tags

Multimodal Machine Translation (MMT) enriches the source text with visua...
research
06/18/2019

Distilling Translations with Visual Awareness

Previous work on multimodal machine translation has shown that visual in...
research
10/07/2019

On Leveraging the Visual Modality for Neural Machine Translation

Leveraging the visual modality effectively for Neural Machine Translatio...
research
09/08/2021

Vision Matters When It Should: Sanity Checking Multimodal Machine Translation Models

Multimodal machine translation (MMT) systems have been shown to outperfo...
research
08/05/2019

Predicting Actions to Help Predict Translations

We address the task of text translation on the How2 dataset using a stat...
research
05/20/2023

Scene Graph as Pivoting: Inference-time Image-free Unsupervised Multimodal Machine Translation with Visual Scene Hallucination

In this work, we investigate a more realistic unsupervised multimodal ma...
research
09/15/2020

Simultaneous Machine Translation with Visual Context

Simultaneous machine translation (SiMT) aims to translate a continuous i...

Please sign up or login with your details

Forgot password? Click here to reset