Scalable graph-based individual named entity identification

11/26/2018
by   Sammy Khalife, et al.
0

Named entity discovery (NED) is an important information retrieval problem that can be decomposed into two sub-problems. The first sub-problem, named entity recognition (NER), aims to tag pre-defined sets of words in a vocabulary (called "named entities": names, places, locations, ...) when they appear in natural language. The second subproblem, named entity linking/identification (NEL), considers these entity mentions as queries to be identified in a pre-existing database. In this paper, we consider the NEL problem, and assume a set of queries (or mentions) that have to be identified within a knowledge base. This knowledge base is represented by a text database paired with a semantic graph. We present state-of-the-art methods in NEL, and propose a 2-step method for individual identification of named entities. Our approach is well-motivated by the limitations brought by recent deep learning approaches that lack interpratability, and require lots of parameter tuning along with large volume of annotated data. First of all, we propose a filtering algorithm designed with information retrieval and text mining techniques, aiming to maximize precision at K (typically for 5 <= K <=20). Then, we introduce two graph-based methods for named entity identification to maximize precision at 1 by re-ranking the remaining top entity candidates. The first identification method is using parametrized graph mining, and the second similarity with graph kernels. Our approach capitalizes on a fine-grained classification of entities from annotated web data. We present our algorithms in details, and show experimentally on standard datasets (NIST TAC-KBP, CONLL/AIDA) their performance in terms of precision are better than any graph-based method reported, and competitive with state-of-the-art systems. Finally, we conclude on the advantages of our graph-based approach compared to recent deep learning methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/12/2018

Named Entity Recognition with Extremely Limited Data

Traditional information retrieval treats named entity recognition as a p...
research
02/08/2017

Automatically Annotated Turkish Corpus for Named Entity Recognition and Text Categorization using Large-Scale Gazetteers

Turkish Wikipedia Named-Entity Recognition and Text Categorization (TWNE...
research
11/13/2018

An Analysis of the Semantic Annotation Task on the Linked Data Cloud

Semantic annotation, the process of identifying key-phrases in texts and...
research
04/09/2019

Knowledge-Augmented Language Model and its Application to Unsupervised Named-Entity Recognition

Traditional language models are unable to efficiently model entity names...
research
05/04/2020

Understanding Scanned Receipts

Tasking machines with understanding receipts can have important applicat...
research
02/03/2017

Extraction of Evolution Descriptions from the Web

The evolution of named entities affects exploration and retrieval tasks ...
research
09/06/2019

#MeTooMaastricht: Building a chatbot to assist survivors of sexual harassment

Inspired by the recent social movement of #MeToo, we are building a chat...

Please sign up or login with your details

Forgot password? Click here to reset