Learning to Scale Multilingual Representations for Vision-Language Tasks

04/09/2020
by   Andrea Burns, et al.
7

Current multilingual vision-language models either require a large number of additional parameters for each supported language, or suffer performance degradation as languages are added. In this paper, we propose a Scalable Multilingual Aligned Language Representation (SMALR) that represents many languages with few model parameters without sacrificing downstream task performance. SMALR learns a fixed size language-agnostic representation for most words in a multilingual vocabulary, keeping language-specific features for few. We use a novel masked cross-language modeling loss to align features with context from other languages. Additionally, we propose a cross-lingual consistency module that ensures predictions made for a query and its machine translation are comparable. The effectiveness of SMALR is demonstrated with ten diverse languages, over twice the number supported in vision-language tasks to date. We evaluate on multilingual image-sentence retrieval and outperform prior work by 3-4 word embedding methods.

READ FULL TEXT

page 21

page 22

research
04/03/2023

A Simple and Effective Method of Cross-Lingual Plagiarism Detection

We present a simple cross-lingual plagiarism detection method applicable...
research
05/26/2023

Tokenization Impacts Multilingual Language Modeling: Assessing Vocabulary Allocation and Overlap Across Languages

Multilingual language models have recently gained attention as a promisi...
research
09/08/2019

MULE: Multimodal Universal Language Embedding

Existing vision-language methods typically support two languages at a ti...
research
10/12/2020

Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models

Massively multilingual models subsuming tens or even hundreds of languag...
research
11/04/2019

Emerging Cross-lingual Structure in Pretrained Language Models

We study the problem of multilingual masked language modeling, i.e. the ...
research
10/24/2020

Improving Multilingual Models with Language-Clustered Vocabularies

State-of-the-art multilingual models depend on vocabularies that cover a...
research
05/22/2022

The Geometry of Multilingual Language Model Representations

We assess how multilingual language models maintain a shared multilingua...

Please sign up or login with your details

Forgot password? Click here to reset