Named Entity Recognition and Classification on Historical Documents: A Survey

09/23/2021
by   Maud Ehrmann, et al.
0

After decades of massive digitisation, an unprecedented amount of historical documents is available in digital format, along with their machine-readable texts. While this represents a major step forward with respect to preservation and accessibility, it also opens up new opportunities in terms of content mining and the next fundamental challenge is to develop appropriate technologies to efficiently search, retrieve and explore information from this 'big data of the past'. Among semantic indexing opportunities, the recognition and classification of named entities are in great demand among humanities scholars. Yet, named entity recognition (NER) systems are heavily challenged with diverse, historical and noisy inputs. In this survey, we present the array of challenges posed by historical documents to NER, inventory existing resources, describe the main approaches deployed so far, and identify key priorities for future developments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/12/2018

Named Entity Recognition with Extremely Limited Data

Traditional information retrieval treats named entity recognition as a p...
research
08/16/2022

Temporal Concept Drift and Alignment: An empirical approach to comparing Knowledge Organization Systems over time

This research explores temporal concept drift and temporal alignment in ...
research
02/20/2023

A Benchmark of Nested Named Entity Recognition Approaches in Historical Structured Documents

Named Entity Recognition (NER) is a key step in the creation of structur...
research
03/30/2023

Yes but.. Can ChatGPT Identify Entities in Historical Documents?

Large language models (LLMs) have been leveraged for several years now, ...
research
09/06/2023

Leave no Place Behind: Improved Geolocation in Humanitarian Documents

Geographical location is a crucial element of humanitarian response, out...
research
09/17/2019

Fast Search with Poor OCR

The indexing and searching of historical documents have garnered attenti...
research
11/09/2016

Old Content and Modern Tools - Searching Named Entities in a Finnish OCRed Historical Newspaper Collection 1771-1910

Named Entity Recognition (NER), search, classification and tagging of na...

Please sign up or login with your details

Forgot password? Click here to reset