CODEC: Complex Document and Entity Collection

05/09/2022
by   Iain Mackie, et al.
0

CODEC is a document and entity ranking benchmark that focuses on complex research topics. We target essay-style information needs of social science researchers, i.e. "How has the UK's Open Banking Regulation benefited Challenger Banks?". CODEC includes 42 topics developed by researchers and a new focused web corpus with semantic annotations including entity links. This resource includes expert judgments on 17,509 documents and entities (416.9 per topic) from diverse automatic and interactive manual runs. The manual runs include 387 query reformulations, providing data for query performance prediction and automatic rewriting evaluation. CODEC includes analysis of state-of-the-art systems, including dense retrieval and neural re-ranking. The results show the topics are challenging with headroom for document and entity ranking improvement. Query expansion with entity information shows significant gains in document ranking, demonstrating the resource's value for evaluating and improving entity-oriented search. We also show that the manual query reformulations significantly improve document ranking and entity ranking performance. Overall, CODEC provides challenging research topics to support the development and evaluation of entity-centric search methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/08/2022

Query-Specific Knowledge Graphs for Complex Finance Topics

Across the financial domain, researchers answer complex questions by ext...
research
09/14/2023

MMEAD: MS MARCO Entity Annotations and Disambiguations

MMEAD, or MS MARCO Entity Annotations and Disambiguations, is a resource...
research
05/17/2021

How Deep is your Learning: the DL-HARD Annotated Deep Learning Dataset

Deep Learning Hard (DL-HARD) is a new annotated dataset designed to more...
research
01/21/2022

Reproducing Personalised Session Search over the AOL Query Log

Despite its troubled past, the AOL Query Log continues to be an importan...
research
02/06/2023

FastCat Catalogues: Interactive Entity-based Exploratory Analysis of Archival Documents

We describe FastCat Catalogues, a Web application that supports research...
research
12/22/2017

Ranking Triples using Entity Links in a Large Web Crawl - The Chicory Triple Scorer at WSDM Cup 2017

This paper describes the participation of team Chicory in the Triple Ran...
research
05/29/2019

TopExNet: Entity-Centric Network Topic Exploration in News Streams

The recent introduction of entity-centric implicit network represen-tati...

Please sign up or login with your details

Forgot password? Click here to reset