Temporal Concept Drift and Alignment: An empirical approach to comparing Knowledge Organization Systems over time

08/16/2022
by   Sam Grabus, et al.
0

This research explores temporal concept drift and temporal alignment in knowledge organization systems (KOS). A comparative analysis is pursued using the 1910 Library of Congress Subject Headings, 2020 FAST Topical, and automatic indexing. The use case involves a sample of 90 nineteenth-century Encyclopedia Britannica entries. The entries were indexed using two approaches: 1) full-text indexing; 2) Named Entity Recognition was performed upon the entries with Stanza, Stanford's NLP toolkit, and entities were automatically indexed with the Helping Interdisciplinary Vocabulary application (HIVE), using both 1910 LCSH and FAST Topical. The analysis focused on three goals: 1) identifying results that were exclusive to the 1910 LCSH output; 2) identifying terms in the exclusive set that have been deprecated from the contemporary LCSH, demonstrating temporal concept drift; and 3) exploring the historical significance of these deprecated terms. Results confirm that historical vocabularies can be used to generate anachronistic subject headings representing conceptual drift across time in KOS and historical resources. A methodological contribution is made demonstrating how to study changes in KOS over time and improve the contextualization of historical humanities resources.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/23/2021

Named Entity Recognition and Classification on Historical Documents: A Survey

After decades of massive digitisation, an unprecedented amount of histor...
research
04/11/2022

Entities, Dates, and Languages: Zero-Shot on Historical Texts with T0

In this work, we explore whether the recently demonstrated zero-shot abi...
research
01/31/2018

A Semantic Model for Historical Manuscripts

The study and publication of historical scientific manuscripts are com- ...
research
03/30/2023

Yes but.. Can ChatGPT Identify Entities in Historical Documents?

Large language models (LLMs) have been leveraged for several years now, ...
research
05/31/2022

hmBERT: Historical Multilingual Language Models for Named Entity Recognition

Compared to standard Named Entity Recognition (NER), identifying persons...
research
03/16/2022

Context-Aware Drift Detection

When monitoring machine learning systems, two-sample tests of homogeneit...
research
09/13/2021

Project Pipeline: Preservation, Persistence, and Performance

Preservation pipelines demonstrate extended value when digitized content...

Please sign up or login with your details

Forgot password? Click here to reset