Rule-and Dictionary-based Solution for Variations in Written Arabic Names in Social Networks, Big Data, Accounting Systems and Large Databases

02/18/2015
by   Ahmad B. A. Hassanat, et al.
0

This paper investigates the problem that some Arabic names can be written in multiple ways. When someone searches for only one form of a name, neither exact nor approximate matching is appropriate for returning the multiple variants of the name. Exact matching requires the user to enter all forms of the name for the search, and approximate matching yields names not among the variations of the one being sought. In this paper, we attempt to solve the problem with a dictionary of all Arabic names mapped to their different (alternative) writing forms. We generated alternatives based on rules we derived from reviewing the first names of 9.9 million citizens and former citizens of Jordan. This dictionary can be used for both standardizing the written form when inserting a new name into a database and for searching for the name and all its alternative written forms. Creating the dictionary automatically based on rules resulted in at least 7 addressed the errors by manually editing the dictionary. The dictionary can be of help to real world-databases, with the qualification that manual editing does not guarantee 100

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/18/2022

MANorm: A Normalization Dictionary for Moroccan Arabic Dialect Written in Latin Script

Social media user-generated text is actually the main resource for many ...
research
10/31/2019

Implementation of an Index Optimize Technology for Highly Specialized Terms based on the Phonetic Algorithm Metaphone

When compiling databases, for example to meet the needs of healthcare es...
research
05/10/2019

Restoring Arabic vowels through omission-tolerant dictionary lookup

Vowels in Arabic are optional orthographic symbols written as diacritics...
research
09/07/2022

When Are Names Similar Or the Same? Introducing the Code Names Matcher Library

Program code contains functions, variables, and data structures that are...
research
03/31/2020

A Swiss German Dictionary: Variation in Speech and Writing

We introduce a dictionary containing forms of common words in various Sw...
research
04/02/2019

Software Tools for Big Data Resources in Family Names Dictionaries

This paper describes the design and development of specific software too...
research
01/22/2021

BERT Transformer model for Detecting Arabic GPT2 Auto-Generated Tweets

During the last two decades, we have progressively turned to the Interne...

Please sign up or login with your details

Forgot password? Click here to reset