DeepAI AI Chat
Log In Sign Up

Imagination improves Multimodal Translation

05/11/2017
by   Desmond Elliott, et al.
Tilburg University
0

We decompose multimodal translation into two sub-tasks: learning to translate and learning visually grounded representations. In a multitask learning framework, translations are learned in an attention-based encoder-decoder, and grounded representations are learned through image representation prediction. Our approach improves translation performance compared to the state of the art on the Multi30K dataset. Furthermore, it is equally effective if we train the image prediction task on the external MS COCO dataset, and we find improvements if we train the translation model on the external News Commentary parallel text.

READ FULL TEXT
09/10/2021

MURAL: Multimodal, Multitask Retrieval Across Languages

Both image-caption pairs and translation pairs provide the means to lear...
07/12/2021

Improving Speech Translation by Understanding and Learning from the Auxiliary Text Translation Task

Pretraining and multitask learning are widely used to improve the speech...
10/20/2018

A Knowledge-Grounded Multimodal Search-Based Conversational Agent

Multimodal search-based dialogue is a challenging new task: It extends v...
10/15/2018

Learning to Jointly Translate and Predict Dropped Pronouns with a Shared Reconstruction Mechanism

Pronouns are frequently omitted in pro-drop languages, such as Chinese, ...
07/19/2017

Learning Visually Grounded Sentence Representations

We introduce a variety of models, trained on a supervised image captioni...
12/09/2017

Modulating and attending the source image during encoding improves Multimodal Translation

We propose a new and fully end-to-end approach for multimodal translatio...
02/19/2018

Tied Multitask Learning for Neural Speech Translation

We explore multitask models for neural translation of speech, augmenting...