Discriminating between Indo-Aryan Languages Using SVM Ensembles

07/09/2018
by   Alina Maria Ciobanu, et al.
0

In this paper we present a system based on SVM ensembles trained on characters and words to discriminate between five similar languages of the Indo-Aryan family: Hindi, Braj Bhasha, Awadhi, Bhojpuri, and Magahi. We investigate the performance of individual features and combine the output of single classifiers to maximize performance. The system competed in the Indo-Aryan Language Identification (ILI) shared task organized within the VarDial Evaluation Campaign 2018. Our best entry in the competition, named ILIdentification, scored 88:95

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

07/22/2018

German Dialect Identification Using Classifier Ensembles

In this paper we present the GDI_classification entry to the second Germ...
04/27/2019

Experiments in Cuneiform Language Identification

This paper presents methods to discriminate between languages and dialec...
11/12/2018

Classifying Patent Applications with Ensemble Methods

We present methods for the automatic classification of patent applicatio...
09/30/2016

Discriminating Similar Languages: Evaluations and Explorations

We present an analysis of the performance of machine learning classifier...
03/23/2018

Stance Detection on Tweets: An SVM-based Approach

Stance detection is a subproblem of sentiment analysis where the stance ...
02/12/2020

Unsupervised Separation of Native and Loanwords for Malayalam and Telugu

Quite often, words from one language are adopted within a different lang...
02/20/2019

Emergence of order in random languages

We consider languages generated by weighted context-free grammars. It is...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.