Aspect-Driven Structuring of Historical Dutch Newspaper Archives

07/14/2023
by   Hermann Kroll, et al.
0

Digital libraries oftentimes provide access to historical newspaper archives via keyword-based search. Historical figures and their roles are particularly interesting cognitive access points in historical research. Structuring and clustering news articles would allow more sophisticated access for users to explore such information. However, real-world limitations such as the lack of training data, licensing restrictions and non-English text with OCR errors make the composition of such a system difficult and cost-intensive in practice. In this work we tackle these issues with the showcase of the National Library of the Netherlands by introducing a role-based interface that structures news articles on historical persons. In-depth, component-wise evaluations and interviews with domain experts highlighted our prototype's effectiveness and appropriateness for a real-world digital library collection.

READ FULL TEXT
research
03/23/2020

BaitWatcher: A lightweight web interface for the detection of incongruent news headlines

In digital environments where substantial amounts of information are sha...
research
03/21/2022

Transformer-based HTR for Historical Documents

We apply the TrOCR framework to real-world, historical manuscripts and s...
research
03/02/2019

Reliable Access to Massive Restricted Texts: Experience-based Evaluation

Libraries are seeing growing numbers of digitized textual corpora that f...
research
06/21/2018

Metadata Enrichment of Multi-Disciplinary Digital Library: A Semantic-based Approach

In the scientific digital libraries, some papers from different research...
research
05/02/2022

A Library Perspective on Nearly-Unsupervised Information Extraction Workflows in Digital Libraries

Information extraction can support novel and effective access paths for ...
research
10/24/2018

History by Diversity: Helping Historians search News Archives

Longitudinal corpora like newspaper archives are of immense value to his...
research
08/24/2023

American Stories: A Large-Scale Structured Text Dataset of Historical U.S. Newspapers

Existing full text datasets of U.S. public domain newspapers do not reco...

Please sign up or login with your details

Forgot password? Click here to reset