Polyglot Neural Language Models: A Case Study in Cross-Lingual Phonetic Representation Learning

05/12/2016
by   Yulia Tsvetkov, et al.
0

We introduce polyglot language models, recurrent neural network models trained to predict symbol sequences in many different languages using shared representations of symbols and conditioning on typological information about the language to be predicted. We apply these to the problem of modeling phone sequences---a domain in which universal symbol inventories and cross-linguistically shared feature representations are a natural fit. Intrinsic evaluation on held-out perplexity, qualitative analysis of the learned representations, and extrinsic evaluation in two downstream applications that make use of phonetic features show (i) that polyglot models better generalize to held-out data than comparable monolingual models and (ii) that polyglot phonetic feature representations are of higher quality than those learned monolingually.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/30/2018

Learning Cross-Lingual Sentence Representations via a Multi-task Dual-Encoder Model

Neural language models have been shown to achieve an impressive level of...
research
11/04/2019

Emerging Cross-lingual Structure in Pretrained Language Models

We study the problem of multilingual masked language modeling, i.e. the ...
research
05/25/2022

Discovering Language-neutral Sub-networks in Multilingual Language Models

Multilingual pre-trained language models perform remarkably well on cros...
research
10/23/2020

DICT-MLM: Improved Multilingual Pre-Training using Bilingual Dictionaries

Pre-trained multilingual language models such as mBERT have shown immens...
research
04/27/2022

LyS_ACoruña at SemEval-2022 Task 10: Repurposing Off-the-Shelf Tools for Sentiment Analysis as Semantic Dependency Parsing

This paper addressed the problem of structured sentiment analysis using ...
research
12/17/2020

The effectiveness of unsupervised subword modeling with autoregressive and cross-lingual phone-aware networks

This study addresses unsupervised subword modeling, i.e., learning acous...
research
10/05/2020

Investigating representations of verb bias in neural language models

Languages typically provide more than one grammatical construction to ex...

Please sign up or login with your details

Forgot password? Click here to reset