MMEAD: MS MARCO Entity Annotations and Disambiguations

09/14/2023
by   Chris Kamphuis, et al.
0

MMEAD, or MS MARCO Entity Annotations and Disambiguations, is a resource for entity links for the MS MARCO datasets. We specify a format to store and share links for both document and passage collections of MS MARCO. Following this specification, we release entity links to Wikipedia for documents and passages in both MS MARCO collections (v1 and v2). Entity links have been produced by the REL and BLINK systems. MMEAD is an easy-to-install Python package, allowing users to load the link data and entity embeddings effortlessly. Using MMEAD takes only a few lines of code. Finally, we show how MMEAD can be used for IR research that uses entity information. We show how to improve recall@1000 and MRR@10 on more complex queries on the MS MARCO v1 passage dataset by using this resource. We also demonstrate how entity expansions can be used for interactive search applications.

READ FULL TEXT

page 7

page 8

research
05/09/2022

CODEC: Complex Document and Entity Collection

CODEC is a document and entity ranking benchmark that focuses on complex...
research
03/02/2017

DAWT: Densely Annotated Wikipedia Texts across multiple languages

In this work, we open up the DAWT dataset - Densely Annotated Wikipedia ...
research
05/22/2023

EnCore: Pre-Training Entity Encoders using Coreference Chains

Entity typing is the task of assigning semantic types to the entities th...
research
04/27/2022

On Survivorship Bias in MS MARCO

Survivorship bias is the tendency to concentrate on the positive outcome...
research
08/01/2017

A Lightweight Front-end Tool for Interactive Entity Population

Entity population, a task of collecting entities that belong to a partic...
research
12/22/2017

Ranking Triples using Entity Links in a Large Web Crawl - The Chicory Triple Scorer at WSDM Cup 2017

This paper describes the participation of team Chicory in the Triple Ran...
research
04/25/2023

The tale of two MS MARCO – and their unfair comparisons

The MS MARCO-passage dataset has been the main large-scale dataset open ...

Please sign up or login with your details

Forgot password? Click here to reset