AltecOnDB: A Large-Vocabulary Arabic Online Handwriting Recognition Database

12/24/2014
by   Ibrahim Abdelaziz, et al.
0

Arabic is a semitic language characterized by a complex and rich morphology. The exceptional degree of ambiguity in the writing system, the rich morphology, and the highly complex word formation process of roots and patterns all contribute to making computational approaches to Arabic very challenging. As a result, a practical handwriting recognition system should support large vocabulary to provide a high coverage and use the context information for disambiguation. Several research efforts have been devoted for building online Arabic handwriting recognition systems. Most of these methods are either using their small private test data sets or a standard database with limited lexicon and coverage. A large scale handwriting database is an essential resource that can advance the research of online handwriting recognition. Currently, there is no online Arabic handwriting database with large lexicon, high coverage, large number of writers and training/testing data. In this paper, we introduce AltecOnDB, a large scale online Arabic handwriting database. AltecOnDB has 98 the Arabic language. The collected samples are complete sentences that include digits and punctuation marks. The collected data is available on sentence, word and character levels, hence, high-level linguistic models can be used for performance improvements. Data is collected from more than 1000 writers with different backgrounds, genders and ages. Annotation and verification tools are developed to facilitate the annotation and verification phases. We built an elementary recognition system to test our database and show the existing difficulties when handling a large vocabulary and dealing with large amounts of styles variations in the collected data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/17/2014

Large Vocabulary Arabic Online Handwriting Recognition System

Arabic handwriting is a consonantal and cursive writing. The analysis of...
research
06/20/2021

Calliar: An Online Handwritten Dataset for Arabic Calligraphy

Calligraphy is an essential part of the Arabic heritage and culture. It ...
research
01/02/2019

Lipi Gnani - A Versatile OCR for Documents in any Language Printed in Kannada Script

A Kannada OCR, named Lipi Gnani, has been designed and developed from sc...
research
11/17/2014

AlexU-Word: A New Dataset for Isolated-Word Closed-Vocabulary Offline Arabic Handwriting Recognition

In this paper, we introduce the first phase of a new dataset for offline...
research
10/18/2017

Build Fast and Accurate Lemmatization for Arabic

In this paper we describe the complexity of building a lemmatizer for Ar...
research
10/28/2015

CBAS: context based arabic stemmer

Arabic morphology encapsulates many valuable features such as word root....
research
04/08/2015

Supporting Language Learners with the Meanings Of Closed Class Items

The process of language learning involves the mastery of countless tasks...

Please sign up or login with your details

Forgot password? Click here to reset