History by Diversity: Helping Historians search News Archives

10/24/2018
by   Jaspreet Singh, et al.
0

Longitudinal corpora like newspaper archives are of immense value to historical research, and time as an important factor for historians strongly influences their search behaviour in these archives. While searching for articles published over time, a key preference is to retrieve documents which cover the important aspects from important points in time which is different from standard search behavior. To support this search strategy, we introduce the notion of a Historical Query Intent to explicitly model a historian's search task and define an aspect-time diversification problem over news archives. We present a novel algorithm, HistDiv, that explicitly models the aspects and important time windows based on a historian's information seeking behavior. By incorporating temporal priors based on publication times and temporal expressions, we diversify both on the aspect and temporal dimensions. We test our methods by constructing a test collection based on The New York Times Collection with a workload of 30 queries of historical intent assessed manually. We find that HistDiv outperforms all competitors in subtopic recall with a slight loss in precision. We also present results of a qualitative user study to determine wether this drop in precision is detrimental to user experience. Our results show that users still preferred HistDiv's ranking.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/24/2018

Designing Search Tasks for Archive Search

Longitudinal corpora like legal, corporate and newspaper archives are of...
research
07/25/2017

Learning Word Relatedness over Time

Search systems are often focused on providing relevant results for the "...
research
04/27/2022

TimeBERT: Enhancing Pre-Trained Language Representations with Temporal Information

Time is an important aspect of text documents, which has been widely exp...
research
03/24/2021

CSFCube – A Test Collection of Computer Science Research Articles for Faceted Query by Example

Query by Example is a well-known information retrieval task in which a d...
research
03/21/2018

Multiple Models for Recommending Temporal Aspects of Entities

Entity aspect recommendation is an emerging task in semantic search that...
research
07/14/2023

Aspect-Driven Structuring of Historical Dutch Newspaper Archives

Digital libraries oftentimes provide access to historical newspaper arch...
research
01/26/2022

Searching, Learning, and Subtopic Ordering: A Simulation-based Analysis

Complex search tasks - such as those from the Search as Learning (SAL) d...

Please sign up or login with your details

Forgot password? Click here to reset