Supporting Interoperability Between Open-Source Search Engines with the Common Index File Format

03/18/2020
by   Jimmy Lin, et al.
0

There exists a natural tension between encouraging a diverse ecosystem of open-source search engines and supporting fair, replicable comparisons across those systems. To balance these two goals, we examine two approaches to providing interoperability between the inverted indexes of several systems. The first takes advantage of internal abstractions around index structures and building wrappers that allow one system to directly read the indexes of another. The second involves sharing indexes across systems via a data exchange specification that we have developed, called the Common Index File Format (CIFF). We demonstrate the first approach with the Java systems Anserini and Terrier, and the second approach with Anserini, JASSv2, OldDog, PISA, and Terrier. Together, these systems provide a wide range of implementations and features, with different research goals. Overall, we recommend CIFF as a low-effort approach to support independent innovation while enabling the types of fair evaluations that are critical for driving the field forward.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/01/2023

A FAIR File Format for Mathematical Software

We describe a generic JSON based file format which is suitable for compu...
research
04/29/2020

Mainlining Databases: Supporting Fast Transactional Workloads on Universal Columnar Data File Formats

The proliferation of modern data processing tools has given rise to open...
research
11/14/2021

Unicode at Gigabytes per Second

We often represent text using Unicode formats (UTF-8 and UTF-16). The UT...
research
05/06/2015

Taking back control of HPC file systems with Robinhood Policy Engine

Today, the largest Lustre file systems store billions of entries. On suc...
research
05/17/2023

Automatic Traffic Scenario Conversion from OpenSCENARIO to CommonRoad

Scenarios are a crucial element for developing, testing, and verifying a...
research
11/11/2022

Anonymization of Whole Slide Images in Histopathology for Research and Education

Objective: The exchange of health-related data is subject to regional la...
research
03/28/2023

Specification-based CSV Support in VDM

CSV is a widely used format for data representing systems control, infor...

Please sign up or login with your details

Forgot password? Click here to reset