Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training

06/01/2022
by   Yan Zeng, et al.
0

In this paper, we introduce Cross-View Language Modeling, a simple and effective language model pre-training framework that unifies cross-lingual cross-modal pre-training with shared architectures and objectives. Our approach is motivated by a key observation that cross-lingual and cross-modal pre-training share the same goal of aligning two different views of the same object into a common semantic space. To this end, the cross-view language modeling framework considers both multi-modal data (i.e., image-caption pairs) and multi-lingual data (i.e., parallel sentence pairs) as two different views of the same object, and trains the model to align the two views by maximizing the mutual information between them with conditional masked language modeling and contrastive learning. We pre-train CCLM, a Cross-lingual Cross-modal Language Model, with the cross-view language modeling framework. Empirical results on IGLUE, a multi-lingual multi-modal benchmark, and two multi-lingual image-text retrieval datasets show that while conceptually simpler, CCLM significantly outperforms the prior state-of-the-art with an average absolute improvement of over 10 model that surpasses the translate-test performance of representative English vision-language models by zero-shot cross-lingual transfer.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/09/2022

ERNIE-UniX2: A Unified Cross-lingual Cross-modal Framework for Understanding and Generation

Recent cross-lingual cross-modal works attempt to extend Vision-Language...
research
11/24/2020

Towards Zero-shot Cross-lingual Image Retrieval

There has been a recent spike in interest in multi-modal Language and Vi...
research
09/11/2023

Dual-view Curricular Optimal Transport for Cross-lingual Cross-modal Retrieval

Current research on cross-modal retrieval is mostly English-oriented, as...
research
05/13/2023

RC3: Regularized Contrastive Cross-lingual Cross-modal Pre-training

Multilingual vision-language (V L) pre-training has achieved remarkabl...
research
07/15/2020

InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training

In this work, we formulate cross-lingual language model pre-training as ...
research
08/25/2021

Product-oriented Machine Translation with Cross-modal Cross-lingual Pre-training

Translating e-commercial product descriptions, a.k.a product-oriented ma...
research
08/26/2022

Cross-Lingual Cross-Modal Retrieval with Noise-Robust Learning

Despite the recent developments in the field of cross-modal retrieval, t...

Please sign up or login with your details

Forgot password? Click here to reset