Feature Engineering for Entity Resolution with Arabic Names: Improving Estimates of Observed Casualties in the Syrian Civil War

03/01/2020
by   Niccolò Dalmasso, et al.
0

Entity resolution or record linkage is the task of identifying records referring to the same entity across multiple data sources. In the absence of a unique identifier entities must be resolved on the basis of possibly noisy and incomplete quasi-identifiers, such as names, ages, and addresses or geographic locations. Our goal is to improve estimates of the total observed casualty count in the ongoing Syrian civil war. Estimating the total victim tools in a conflict is an important element to understand its extend and magnitude, drive intervention policies and also to aid in bringing justice to perpetrators and mass murderers. Our data comprise multiple lists of casualties, compiled by the Human Rights Data Analysis Group. To arrive at an estimate of the number of unique casualties we first need to detect duplicate entries within and across lists. By focusing on Arabic names and their structure, we develop new features for comparing records and demonstrate meaningful improvements over existing classifiers (which have already seen significant engineering), empirically supporting the importance of language-specific analysis. We expect that these features will be useful in other contexts where it is necessary to measure the similarity between Arabic names.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/14/2015

A Practioner's Guide to Evaluating Entity Resolution Results

Entity resolution (ER) is the task of identifying records belonging to t...
research
11/06/2018

Computing Entity Semantic Similarity by Features Ranking

This article presents a novel approach to estimate semantic entity simil...
research
10/07/2017

Unique Entity Estimation with Application to the Syrian Conflict

Entity resolution identifies and removes duplicate entities in large, no...
research
12/29/2017

Personal Names in Modern Turkey

We analyzed the most common 5000 male and 5000 female Turkish names base...
research
03/12/2021

Automatic Romanization of Arabic Bibliographic Records

International library standards require cataloguers to tediously input R...
research
01/12/2017

SMARTies: Sentiment Models for Arabic Target Entities

We consider entity-level sentiment analysis in Arabic, a morphologically...
research
11/13/2018

Personal Names Popularity Estimation and its Application to Record Linkage

This study deals with a fairly simply formulated problem -- how to estim...

Please sign up or login with your details

Forgot password? Click here to reset