Build Fast and Accurate Lemmatization for Arabic

10/18/2017
by   Hamdy Mubarak, et al.
0

In this paper we describe the complexity of building a lemmatizer for Arabic which has a rich and complex derivational morphology, and we discuss the need for a fast and accurate lammatization to enhance Arabic Information Retrieval (IR) results. We also introduce a new data set that can be used to test lemmatization accuracy, and an efficient lemmatization algorithm that outperforms state-of-the-art Arabic lemmatization in terms of accuracy and speed. We share the data set and the code for public.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/15/2019

An Accuracy-Enhanced Stemming Algorithm for Arabic Information Retrieval

This paper provides a method for indexing and retrieving Arabic texts, b...
research
10/17/2014

Large Vocabulary Arabic Online Handwriting Recognition System

Arabic handwriting is a consonantal and cursive writing. The analysis of...
research
02/07/2017

Effects of Stop Words Elimination for Arabic Information Retrieval: A Comparative Study

The effectiveness of three stop words lists for Arabic Information Retri...
research
08/18/2017

EveTAR: Building a Large-Scale Multi-Task Test Collection over Arabic Tweets

This article introduces a new language-independent approach for creating...
research
10/31/2020

Neural Coreference Resolution for Arabic

No neural coreference resolver for Arabic exists, in fact we are not awa...
research
12/24/2014

AltecOnDB: A Large-Vocabulary Arabic Online Handwriting Recognition Database

Arabic is a semitic language characterized by a complex and rich morphol...
research
01/03/2023

An ensemble-based framework for mispronunciation detection of Arabic phonemes

Determination of mispronunciations and ensuring feedback to users are ma...

Please sign up or login with your details

Forgot password? Click here to reset