Instance-based Transfer Learning for Multilingual Deep Retrieval

11/08/2019
by   Andrew O. Arnold, et al.
0

Perhaps the simplest type of multilingual transfer learning is instance-based transfer learning, in which data from the target language and the auxiliary languages are pooled, and a single model is learned from the pooled data. It is not immediately obvious when instance-based transfer learning will improve performance in this multilingual setting: for instance, a plausible conjecture is this kind of transfer learning would help only if the auxiliary languages were very similar to the target. Here we show that at large scale, this method is surprisingly effective, leading to positive transfer on all of 35 target languages we tested. We analyze this improvement and argue that the most natural explanation, namely direct vocabulary overlap between languages, only partially explains the performance gains: in fact, we demonstrate target-language improvement can occur after adding data from an auxiliary language with no vocabulary in common with the target. This surprising result is due to the effect of transitive vocabulary overlaps between pairs of auxiliary and target languages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/24/2020

When Being Unseen from mBERT is just the Beginning: Handling New Languages With Multilingual Language Models

Transfer learning based on pretraining language models on a large amount...
research
08/29/2023

Target PCA: Transfer Learning Large Dimensional Panel Data

This paper develops a novel method to estimate a latent factor model for...
research
12/16/2020

Multilingual Evidence Retrieval and Fact Verification to Combat Global Disinformation: The Power of Polyglotism

This article investigates multilingual evidence retrieval and fact verif...
research
11/03/2018

Transfer Learning in Multilingual Neural Machine Translation with Dynamic Vocabulary

We propose a method to transfer knowledge across neural machine translat...
research
04/28/2023

Training and Evaluation of a Multilingual Tokenizer for GPT-SW3

This paper provides a detailed discussion of the multilingual tokenizer ...
research
05/22/2023

Crosslingual Transfer Learning for Low-Resource Languages Based on Multilingual Colexification Graphs

Colexification in comparative linguistics refers to the phenomenon of a ...
research
10/20/2021

Continual Learning in Multilingual NMT via Language-Specific Embeddings

This paper proposes a technique for adding a new source or target langua...

Please sign up or login with your details

Forgot password? Click here to reset