Albanian Language Identification in Text Documents

01/14/2019
by   Klesti Hoxha, et al.
0

In this work we investigate the accuracy of standard and state-of-the-art language identification methods in identifying Albanian in written text documents. A dataset consisting of news articles written in Albanian has been constructed for this purpose. We noticed a considerable decrease of accuracy when using test documents that miss the Albanian alphabet letters " Ë " and " Ç " and created a custom training corpus that solved this problem by achieving an accuracy of more than 99 performing language identification methods for Albanian use a naïve Bayes classifier and n-gram based classification features.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/31/2019

Funnelling: A New Ensemble Method for Heterogeneous Transfer Learning and its Application to Polylingual Text Classification

Polylingual Text Classification (PLC) consists of automatically classify...
research
10/13/2012

Inference of Fine-grained Attributes of Bengali Corpus for Stylometry Detection

Stylometry, the science of inferring characteristics of the author from ...
research
03/15/2019

An Exploration of State-of-the-art Methods for Offensive Language Detection

We provide a comprehensive investigation of different custom and off-the...
research
11/19/2018

The Mafiascum Dataset: A Large Text Corpus for Deception Detection

Detecting deception in natural language has a wide variety of applicatio...
research
03/15/2019

SemEval 2019 Task 6: An exploration of state-of-the-art methods for offensive language detection

We provide a comprehensive investigation of different custom and off-the...
research
01/13/2017

LIDE: Language Identification from Text Documents

The increase in the use of microblogging came along with the rapid growt...
research
12/30/2009

Writer Identification Using Inexpensive Signal Processing Techniques

We propose to use novel and classical audio and text signal-processing a...

Please sign up or login with your details

Forgot password? Click here to reset