On Leveraging the Visual Modality for Neural Machine Translation

10/07/2019
by   Vikas Raunak, et al.
0

Leveraging the visual modality effectively for Neural Machine Translation (NMT) remains an open problem in computational linguistics. Recently, Caglayan et al. posit that the observed gains are limited mainly due to the very simple, short, repetitive sentences of the Multi30k dataset (the only multimodal MT dataset available at the time), which renders the source text sufficient for context. In this work, we further investigate this hypothesis on a new large scale multimodal Machine Translation (MMT) dataset, How2, which has 1.57 times longer mean sentence length than Multi30k and no repetition. We propose and evaluate three novel fusion techniques, each of which is designed to ensure the utilization of visual context at different stages of the Sequence-to-Sequence transduction pipeline, even under full linguistic context. However, we still obtain only marginal gains under full linguistic context and posit that visual embeddings extracted from deep vision models (ResNet for Multi30k, ResNext for How2) do not lend themselves to increasing the discriminativeness between the vocabulary elements at token level prediction in NMT. We demonstrate this qualitatively by analyzing attention distribution and quantitatively through Principal Component Analysis, arriving at the conclusion that it is the quality of the visual embeddings rather than the length of sentences, which need to be improved in existing MMT datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/20/2019

Probing the Need for Visual Context in Multimodal Machine Translation

Current work on multimodal machine translation (MMT) has suggested that ...
research
02/19/2019

Semantic Neural Machine Translation using AMR

It is intuitive that semantic representations can be useful for machine ...
research
09/08/2021

Vision Matters When It Should: Sanity Checking Multimodal Machine Translation Models

Multimodal machine translation (MMT) systems have been shown to outperfo...
research
09/13/2016

Multimodal Attention for Neural Machine Translation

The attention mechanism is an important part of the neural machine trans...
research
05/31/2019

Examining Structure of Word Embeddings with PCA

In this paper we compare structure of Czech word embeddings for English-...
research
11/04/2019

On Compositionality in Neural Machine Translation

We investigate two specific manifestations of compositionality in Neural...
research
02/15/2019

Deeply Supervised Multimodal Attentional Translation Embeddings for Visual Relationship Detection

Detecting visual relationships, i.e. <Subject, Predicate, Object> triple...

Please sign up or login with your details

Forgot password? Click here to reset