A Portuguese Native Language Identification Dataset

04/30/2018
by   Iria del Río, et al.
0

In this paper we present NLI-PT, the first Portuguese dataset compiled for Native Language Identification (NLI), the task of identifying an author's first language based on their second language writing. The dataset includes 1,868 student essays written by learners of European Portuguese, native speakers of the following L1s: Chinese, English, Spanish, German, Russian, French, Japanese, Italian, Dutch, Tetum, Arabic, Polish, Korean, Romanian, and Swedish. NLI-PT includes the original student text and four different types of annotation: POS, fine-grained POS, constituency parses, and dependency parses. NLI-PT can be used not only in NLI but also in research on several topics in the field of Second Language Acquisition and educational NLP. We discuss possible applications of this dataset and present the results obtained for the first lexical baseline system for Portuguese NLI.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/09/2018

Native Language Identification using i-vector

The task of determining a speaker's native language based only on his sp...
research
05/24/2018

Native Language Cognate Effects on Second Language Lexical Choice

We present a computational analysis of cognate effects on the spontaneou...
research
07/27/2023

Turkish Native Language Identification

In this paper, we present the first application of Native Language Ident...
research
08/02/2022

Unravelling Interlanguage Facts via Explainable Machine Learning

Native language identification (NLI) is the task of training (via superv...
research
10/13/2012

Inference of Fine-grained Attributes of Bengali Corpus for Stylometry Detection

Stylometry, the science of inferring characteristics of the author from ...
research
09/13/2023

Native Language Identification with Big Bird Embeddings

Native Language Identification (NLI) intends to classify an author's nat...
research
04/22/2016

SweLL on the rise: Swedish Learner Language corpus for European Reference Level studies

We present a new resource for Swedish, SweLL, a corpus of Swedish Learne...

Please sign up or login with your details

Forgot password? Click here to reset