Native Language Identification with Big Bird Embeddings

09/13/2023
by   Sergey Kramp, et al.
0

Native Language Identification (NLI) intends to classify an author's native language based on their writing in another language. Historically, the task has heavily relied on time-consuming linguistic feature engineering, and transformer-based NLI models have thus far failed to offer effective, practical alternatives. The current work investigates if input size is a limiting factor, and shows that classifiers trained using Big Bird embeddings outperform linguistic feature engineering models by a large margin on the Reddit-L2 dataset. Additionally, we provide further insight into input length dependencies, show consistent out-of-sample performance, and qualitatively analyze the embedding space. Given the effectiveness and computational efficiency of this method, we believe it offers a promising avenue for future NLI work.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/02/2022

Unravelling Interlanguage Facts via Explainable Machine Learning

Native language identification (NLI) is the task of training (via superv...
research
07/22/2017

Native Language Identification on Text and Speech

This paper presents an ensemble system combining the output of multiple ...
research
04/30/2018

A Portuguese Native Language Identification Dataset

In this paper we present NLI-PT, the first Portuguese dataset compiled f...
research
11/18/2022

Scaling Native Language Identification with Transformer Adapters

Native language identification (NLI) is the task of automatically identi...
research
11/25/2020

Neural Representations for Modeling Variation in English Speech

Variation in speech is often represented and investigated using phonetic...
research
03/24/2016

Contrastive Analysis with Predictive Power: Typology Driven Estimation of Grammatical Error Distributions in ESL

This work examines the impact of cross-linguistic transfer on grammatica...
research
05/30/2017

A Low Dimensionality Representation for Language Variety Identification

Language variety identification aims at labelling texts in a native lang...

Please sign up or login with your details

Forgot password? Click here to reset