On Vision Features in Multimodal Machine Translation

03/17/2022
by   Bei Li, et al.
0

Previous work on multimodal machine translation (MMT) has focused on the way of incorporating vision features into translation but little attention is on the quality of vision models. In this work, we investigate the impact of vision models on MMT. Given the fact that Transformer is becoming popular in computer vision, we experiment with various strong models (such as Vision Transformer) and enhanced features (such as object-detection and image captioning). We develop a selective attention model to study the patch-level contribution of an image in MMT. On detailed probing tasks, we find that stronger vision models are helpful for learning translation from the visual modality. Our results also suggest the need of carefully examining MMT models, especially when current benchmarks are small-scale and biased. Our code could be found at <https://github.com/libeineu/fairseq_mmt>.

READ FULL TEXT

page 7

page 8

research
07/30/2018

Doubly Attentive Transformer Machine Translation

In this paper a doubly attentive transformer machine translation model (...
research
10/09/2018

Image Captioning as Neural Machine Translation Task in SOCKEYE

Image captioning is an interdisciplinary research problem that stands be...
research
05/07/2018

Multimodal Machine Translation with Reinforcement Learning

Multimodal machine translation is one of the applications that integrate...
research
06/12/2023

A Survey of Vision-Language Pre-training from the Lens of Multimodal Machine Translation

Large language models such as BERT and the GPT series started a paradigm...
research
09/27/2021

GANiry: Bald-to-Hairy Translation Using CycleGAN

This work presents our computer vision course project called bald men-to...
research
08/29/2023

CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation

There has been a growing interest in developing multimodal machine trans...
research
12/20/2022

Beyond Triplet: Leveraging the Most Data for Multimodal Machine Translation

Multimodal machine translation (MMT) aims to improve translation quality...

Please sign up or login with your details

Forgot password? Click here to reset