On the Reproducibility of Experiments of Indexing Repetitive Document Collections

12/26/2019
by   Antonio Fariña, et al.
0

This work introduces a companion reproducible paper with the aim of allowing the exact replication of the methods, experiments, and results discussed in a previous work [5]. In that parent paper, we proposed many and varied techniques for compressing indexes which exploit that highly repetitive collections are formed mostly of documents that are near-copies of others. More concretely, we describe a replication framework, called uiHRDC (universal indexes for Highly Repetitive Document Collections), that allows our original experimental setup to be easily replicated using various document collections. The corresponding experimentation is carefully explained, providing precise details about the parameters that can be tuned for each indexing solution. Finally, note that we also provide uiHRDC as reproducibility package.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/08/2023

HotOS XIX Panel Report: Panel on Future of Reproduction and Replication of Systems Research

At HotOS XIX (2023), we organized a panel to discuss the future of repro...
research
02/20/2019

Fast, Small, and Simple Document Listing on Repetitive Text Collections

Document listing on string collections is the task of finding all docume...
research
01/31/2023

Archive TimeLine Summarization (ATLS): Conceptual Framework for Timeline Generation over Historical Document Collections

Archive collections are nowadays mostly available through search engines...
research
05/25/2021

Reproducibility Companion Paper: Knowledge Enhanced Neural Fashion Trend Forecasting

This companion paper supports the replication of the fashion trend forec...
research
07/16/2018

Repeatability Corner Cases in Document Ranking: The Impact of Score Ties

Document ranking experiments should be repeatable: running the same rank...
research
04/20/2022

MEDFORD: A human and machine readable metadata markup language

Reproducibility of research is essential for science. However, in the wa...
research
01/06/2010

Random Indexing K-tree

Random Indexing (RI) K-tree is the combination of two algorithms for clu...

Please sign up or login with your details

Forgot password? Click here to reset