Discriminating Similar Languages: Evaluations and Explorations

09/30/2016
by   Cyril Goutte, et al.
0

We present an analysis of the performance of machine learning classifiers on discriminating between similar languages and language varieties. We carried out a number of experiments using the results of the two editions of the Discriminating between Similar Languages (DSL) shared task. We investigate the progress made between the two tasks, estimate an upper bound on possible performance using ensemble and oracle combination, and provide learning curves to help us understand which languages are more challenging. A number of difficult sentences are identified and investigated further with human annotation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/11/2020

Discriminating Between Similar Nordic Languages

Automatic language identification is a challenging problem. Discriminati...
research
11/18/2019

Short Text Language Identification for Under Resourced Languages

The paper presents a hierarchical naive Bayesian and lexicon based class...
research
07/09/2018

Discriminating between Indo-Aryan Languages Using SVM Ensembles

In this paper we present a system based on SVM ensembles trained on char...
research
08/07/2021

Improving Similar Language Translation With Transfer Learning

We investigate transfer learning based on pre-trained neural machine tra...
research
05/31/2023

Findings of the VarDial Evaluation Campaign 2023

This report presents the results of the shared tasks organized as part o...
research
10/13/2017

Complex Word Identification: Challenges in Data Annotation and System Performance

This paper revisits the problem of complex word identification (CWI) fol...
research
02/03/2023

Towards a responsible machine learning approach to identify forced labor in fisheries

Many fishing vessels use forced labor, but identifying vessels that enga...

Please sign up or login with your details

Forgot password? Click here to reset