DeepAI AI Chat
Log In Sign Up

Imagination improves Multimodal Translation

by   Desmond Elliott, et al.
Tilburg University

We decompose multimodal translation into two sub-tasks: learning to translate and learning visually grounded representations. In a multitask learning framework, translations are learned in an attention-based encoder-decoder, and grounded representations are learned through image representation prediction. Our approach improves translation performance compared to the state of the art on the Multi30K dataset. Furthermore, it is equally effective if we train the image prediction task on the external MS COCO dataset, and we find improvements if we train the translation model on the external News Commentary parallel text.


MURAL: Multimodal, Multitask Retrieval Across Languages

Both image-caption pairs and translation pairs provide the means to lear...

Improving Speech Translation by Understanding and Learning from the Auxiliary Text Translation Task

Pretraining and multitask learning are widely used to improve the speech...

A Knowledge-Grounded Multimodal Search-Based Conversational Agent

Multimodal search-based dialogue is a challenging new task: It extends v...

Learning to Jointly Translate and Predict Dropped Pronouns with a Shared Reconstruction Mechanism

Pronouns are frequently omitted in pro-drop languages, such as Chinese, ...

Learning Visually Grounded Sentence Representations

We introduce a variety of models, trained on a supervised image captioni...

Modulating and attending the source image during encoding improves Multimodal Translation

We propose a new and fully end-to-end approach for multimodal translatio...

Tied Multitask Learning for Neural Speech Translation

We explore multitask models for neural translation of speech, augmenting...