Bootstrapping Disjoint Datasets for Multilingual Multimodal Representation Learning

11/09/2019
by   Akos Kadar, et al.
0

Recent work has highlighted the advantage of jointly learning grounded sentence representations from multiple languages. However, the data used in these studies has been limited to an aligned scenario: the same images annotated with sentences in multiple languages. We focus on the more realistic disjoint scenario in which there is no overlap between the images in multilingual image–caption datasets. We confirm that training with aligned data results in better grounded sentence representations than training with disjoint data, as measured by image–sentence retrieval performance. In order to close this gap in performance, we propose a pseudopairing method to generate synthetically aligned English–German–image triplets from the disjoint sets. The method works by first training a model on the disjoint data, and then creating new triples across datasets using sentence similarity under the learned model. Experiments show that pseudopairs improve image–sentence retrieval performance compared to disjoint training, despite requiring no external data or models. However, we do find that using an external machine translation model to generate the synthetic data sets results in better performance.

READ FULL TEXT
research
09/20/2018

Lessons learned in multilingual grounded language learning

Recent work has shown how to learn better visual-semantic embeddings by ...
research
07/24/2017

Image Pivoting for Learning Multilingual Multimodal Representations

In this paper we propose a model to learn multimodal multilingual repres...
research
10/07/2017

OSU Multimodal Machine Translation System Report

This paper describes Oregon State University's submissions to the shared...
research
04/13/2017

Learning Joint Multilingual Sentence Representations with Neural Machine Translation

In this paper, we use the framework of neural machine translation to lea...
research
09/30/2019

Multi-Head Attention with Diversity for Learning Grounded Multilingual Multimodal Representations

With the aim of promoting and understanding the multilingual version of ...
research
12/28/2020

Universal Sentence Representation Learning with Conditional Masked Language Model

This paper presents a novel training method, Conditional Masked Language...
research
09/08/2019

MULE: Multimodal Universal Language Embedding

Existing vision-language methods typically support two languages at a ti...

Please sign up or login with your details

Forgot password? Click here to reset