Build Fast and Accurate Lemmatization for Arabic

10/18/2017
by   Hamdy Mubarak, et al.
0

In this paper we describe the complexity of building a lemmatizer for Arabic which has a rich and complex derivational morphology, and we discuss the need for a fast and accurate lammatization to enhance Arabic Information Retrieval (IR) results. We also introduce a new data set that can be used to test lemmatization accuracy, and an efficient lemmatization algorithm that outperforms state-of-the-art Arabic lemmatization in terms of accuracy and speed. We share the data set and the code for public.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

11/15/2019

An Accuracy-Enhanced Stemming Algorithm for Arabic Information Retrieval

This paper provides a method for indexing and retrieving Arabic texts, b...
10/17/2014

Large Vocabulary Arabic Online Handwriting Recognition System

Arabic handwriting is a consonantal and cursive writing. The analysis of...
02/07/2017

Effects of Stop Words Elimination for Arabic Information Retrieval: A Comparative Study

The effectiveness of three stop words lists for Arabic Information Retri...
08/18/2017

EveTAR: Building a Large-Scale Multi-Task Test Collection over Arabic Tweets

This article introduces a new language-independent approach for creating...
03/12/2021

Automatic Romanization of Arabic Bibliographic Records

International library standards require cataloguers to tediously input R...
12/24/2014

AltecOnDB: A Large-Vocabulary Arabic Online Handwriting Recognition Database

Arabic is a semitic language characterized by a complex and rich morphol...
10/25/2015

Statistical Parsing by Machine Learning from a Classical Arabic Treebank

Research into statistical parsing for English has enjoyed over a decade ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.