A Low Dimensionality Representation for Language Variety Identification

05/30/2017
by   Francisco Rangel, et al.
0

Language variety identification aims at labelling texts in a native language (e.g. Spanish, Portuguese, English) with its specific variation (e.g. Argentina, Chile, Mexico, Peru, Spain; Brazil, Portugal; UK, US). In this work we propose a low dimensionality representation (LDR) to address this task with five different varieties of Spanish: Argentina, Chile, Mexico, Peru and Spain. We compare our LDR method with common state-of-the-art representations and show an increase in accuracy of 35 distributed representation models. Experimental results show competitive performance while dramatically reducing the dimensionality --and increasing the big data suitability-- to only 6 features per variety. Additionally, we analyse the behaviour of the employed machine learning algorithms and the most discriminating features. Finally, we employ an alternative dataset to test the robustness of our low dimensionality representation with another set of similar languages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/27/2023

Turkish Native Language Identification

In this paper, we present the first application of Native Language Ident...
research
03/02/2023

Language Variety Identification with True Labels

Language identification is an important first step in many IR and NLP ap...
research
03/18/2022

Offensive Language Detection in Under-resourced Algerian Dialectal Arabic Language

This paper addresses the problem of detecting the offensive and abusive ...
research
06/09/2022

Language Identification for Austronesian Languages

This paper provides language identification models for low- and under-re...
research
03/24/2016

Contrastive Analysis with Predictive Power: Typology Driven Estimation of Grammatical Error Distributions in ESL

This work examines the impact of cross-linguistic transfer on grammatica...
research
01/28/2019

Language Independent Sequence Labelling for Opinion Target Extraction

In this research note we present a language independent system to model ...
research
09/13/2023

Native Language Identification with Big Bird Embeddings

Native Language Identification (NLI) intends to classify an author's nat...

Please sign up or login with your details

Forgot password? Click here to reset