Multimodal Pivots for Image Caption Translation

01/15/2016
by   Julian Hitschler, et al.
0

We present an approach to improve statistical machine translation of image descriptions by multimodal pivots defined in visual space. The key idea is to perform image retrieval over a database of images that are captioned in the target language, and use the captions of the most similar images for crosslingual reranking of translation outputs. Our approach does not depend on the availability of large amounts of in-domain parallel data, but only relies on available large datasets of monolingually captioned images, and on state-of-the-art convolutional neural networks to compute image similarities. Our experimental evaluation shows improvements of 1 BLEU point over strong baselines.

READ FULL TEXT
research
07/26/2022

Multimodal Neural Machine Translation with Search Engine Based Image Retrieval

Recently, numbers of works shows that the performance of neural machine ...
research
05/30/2016

Does Multimodality Help Human and Machine for Translation and Image Captioning?

This paper presents the systems developed by LIUM and CVC for the WMT16 ...
research
09/08/2021

Vision Matters When It Should: Sanity Checking Multimodal Machine Translation Models

Multimodal machine translation (MMT) systems have been shown to outperfo...
research
12/20/2022

Tackling Ambiguity with Images: Improved Multimodal Machine Translation and Contrastive Evaluation

One of the major challenges of machine translation (MT) is ambiguity, wh...
research
07/04/2017

Visually Grounded Word Embeddings and Richer Visual Features for Improving Multimodal Neural Machine Translation

In Multimodal Neural Machine Translation (MNMT), a neural model generate...
research
07/14/2017

CUNI System for the WMT17 Multimodal Translation Task

In this paper, we describe our submissions to the WMT17 Multimodal Trans...

Please sign up or login with your details

Forgot password? Click here to reset