Imagination improves Multimodal Translation

05/11/2017
by   Desmond Elliott, et al.
0

We decompose multimodal translation into two sub-tasks: learning to translate and learning visually grounded representations. In a multitask learning framework, translations are learned in an attention-based encoder-decoder, and grounded representations are learned through image representation prediction. Our approach improves translation performance compared to the state of the art on the Multi30K dataset. Furthermore, it is equally effective if we train the image prediction task on the external MS COCO dataset, and we find improvements if we train the translation model on the external News Commentary parallel text.

READ FULL TEXT
research
09/10/2021

MURAL: Multimodal, Multitask Retrieval Across Languages

Both image-caption pairs and translation pairs provide the means to lear...
research
07/12/2021

Improving Speech Translation by Understanding and Learning from the Auxiliary Text Translation Task

Pretraining and multitask learning are widely used to improve the speech...
research
10/20/2018

A Knowledge-Grounded Multimodal Search-Based Conversational Agent

Multimodal search-based dialogue is a challenging new task: It extends v...
research
10/15/2018

Learning to Jointly Translate and Predict Dropped Pronouns with a Shared Reconstruction Mechanism

Pronouns are frequently omitted in pro-drop languages, such as Chinese, ...
research
07/19/2017

Learning Visually Grounded Sentence Representations

We introduce a variety of models, trained on a supervised image captioni...
research
12/09/2017

Modulating and attending the source image during encoding improves Multimodal Translation

We propose a new and fully end-to-end approach for multimodal translatio...
research
09/17/2019

Multimodal Multitask Representation Learning for Pathology Biobank Metadata Prediction

Metadata are general characteristics of the data in a well-curated and c...

Please sign up or login with your details

Forgot password? Click here to reset