Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing

07/02/2018
by   Edoardo Maria Ponti, et al.
0

Addressing the cross-lingual variation of grammatical structures and meaning categorization is a key challenge for multilingual Natural Language Processing. The lack of resources for the majority of the world's languages makes supervised learning not viable. Moreover, the performance of most algorithms is hampered by language-specific biases and the neglect of informative multilingual data. The discipline of Linguistic Typology provides a principled framework to compare languages systematically and empirically and documents their variation in publicly available databases. These enshrine crucial information to design language-independent algorithms and refine techniques devised to mitigate the above-mentioned issues, including cross-lingual transfer and multilingual joint models, with typological features. In this survey, we demonstrate that typology is beneficial to several NLP applications, involving both semantic and syntactic tasks. Moreover, we outline several techniques to extract features from databases or acquire them automatically: these features can be subsequently integrated into multilingual models to tie parameters together cross-lingually or gear a model towards a specific language. Finally, we advocate for a new typology that accounts for the patterns within individual examples rather than entire languages, and for graded categories rather than discrete ones, in oder to bridge the gap with the contextual and continuous nature of machine learning algorithms.

READ FULL TEXT

page 8

page 20

page 29

page 31

page 39

research
06/15/2017

A Survey Of Cross-lingual Word Embedding Models

Cross-lingual representations of words enable us to reason about word me...
research
11/12/2015

A Multilingual FrameNet-based Grammar and Lexicon for Controlled Natural Language

Berkeley FrameNet is a lexico-semantic resource for English based on the...
research
05/02/2020

Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer

Multilingual representations embed words from many languages into a sing...
research
10/24/2020

Cross-neutralising: Probing for joint encoding of linguistic information in multilingual models

Multilingual sentence encoders are widely used to transfer NLP models ac...
research
04/03/2023

ScandEval: A Benchmark for Scandinavian Natural Language Processing

This paper introduces a Scandinavian benchmarking platform, ScandEval, w...
research
06/05/2023

Colexifications for Bootstrapping Cross-lingual Datasets: The Case of Phonology, Concreteness, and Affectiveness

Colexification refers to the linguistic phenomenon where a single lexica...
research
10/09/2018

Learning Noun Cases Using Sequential Neural Networks

Morphological declension, which aims to inflect nouns to indicate number...

Please sign up or login with your details

Forgot password? Click here to reset