M3P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-training

06/04/2020
by   Haoyang Huang, et al.
7

This paper presents a Multitask Multilingual Multimodal Pre-trained model (M3P) that combines multilingual-monomodal pre-training and monolingual-multimodal pre-training into a unified framework via multitask learning and weight sharing. The model learns universal representations that can map objects that occurred in different modalities or expressed in different languages to vectors in a common semantic space. To verify the generalization capability of M3P, we fine-tune the pre-trained model for different types of downstream tasks: multilingual image-text retrieval, multilingual image captioning, multimodal machine translation, multilingual natural language inference and multilingual text generation. Evaluation shows that M3P can (i) achieve comparable results on multilingual tasks and English multimodal tasks, compared to the state-of-the-art models pre-trained for these two types of tasks separately, and (ii) obtain new state-of-the-art results on non-English multimodal tasks in the zero-shot or few-shot setting. We also build a new Multilingual Image-Language Dataset (MILD) by collecting large amounts of (text-query, image, context) triplets in 8 languages from the logs of a commercial search engine

READ FULL TEXT
research
03/16/2021

Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models

This paper studies zero-shot cross-lingual transfer of vision-language m...
research
02/08/2023

A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity

This paper proposes a framework for quantitatively evaluating interactiv...
research
03/11/2023

ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language Generation

Natural Language Generation (NLG) accepts input data in the form of imag...
research
06/07/2021

BERTGEN: Multi-task Generation through BERT

We present BERTGEN, a novel generative, decoder-only model which extends...
research
04/29/2022

Polyglot Prompt: Multilingual Multitask PrompTraining

This paper aims for a potential architectural breakthrough for multiling...
research
11/15/2022

Multilingual and Multimodal Topic Modelling with Pretrained Embeddings

This paper presents M3L-Contrast – a novel multimodal multilingual (M3L)...
research
09/10/2021

MURAL: Multimodal, Multitask Retrieval Across Languages

Both image-caption pairs and translation pairs provide the means to lear...

Please sign up or login with your details

Forgot password? Click here to reset