Vision Matters When It Should: Sanity Checking Multimodal Machine Translation Models

09/08/2021
by   Jiaoda Li, et al.
0

Multimodal machine translation (MMT) systems have been shown to outperform their text-only neural machine translation (NMT) counterparts when visual context is available. However, recent studies have also shown that the performance of MMT models is only marginally impacted when the associated image is replaced with an unrelated image or noise, which suggests that the visual context might not be exploited by the model at all. We hypothesize that this might be caused by the nature of the commonly used evaluation benchmark, also known as Multi30K, where the translations of image captions were prepared without actually showing the images to human translators. In this paper, we present a qualitative study that examines the role of datasets in stimulating the leverage of visual modality and we propose methods to highlight the importance of visual signals in the datasets which demonstrate improvements in reliance of models on the source images. Our findings suggest the research on effective MMT architectures is currently impaired by the lack of suitable datasets and careful consideration must be taken in creation of future MMT datasets, for which we also provide useful insights.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/20/2019

Probing the Need for Visual Context in Multimodal Machine Translation

Current work on multimodal machine translation (MMT) has suggested that ...
research
10/07/2019

On Leveraging the Visual Modality for Neural Machine Translation

Leveraging the visual modality effectively for Neural Machine Translatio...
research
02/16/2023

Generalization algorithm of multimodal pre-training model based on graph-text self-supervised training

Recently, a large number of studies have shown that the introduction of ...
research
01/15/2016

Multimodal Pivots for Image Caption Translation

We present an approach to improve statistical machine translation of ima...
research
04/07/2020

Towards Multimodal Simultaneous Neural Machine Translation

Simultaneous translation involves translating a sentence before the spea...
research
07/21/2019

Hindi Visual Genome: A Dataset for Multimodal English-to-Hindi Machine Translation

Visual Genome is a dataset connecting structured image information with ...

Please sign up or login with your details

Forgot password? Click here to reset