QUEST: A Retrieval Dataset of Entity-Seeking Queries with Implicit Set Operations

by   Chaitanya Malaviya, et al.

Formulating selective information needs results in queries that implicitly specify set operations, such as intersection, union, and difference. For instance, one might search for "shorebirds that are not sandpipers" or "science-fiction films shot in England". To study the ability of retrieval systems to meet such information needs, we construct QUEST, a dataset of 3357 natural language queries with implicit set operations, that map to a set of entities corresponding to Wikipedia documents. The dataset challenges models to match multiple constraints mentioned in queries with corresponding evidence in documents and correctly perform various set operations. The dataset is constructed semi-automatically using Wikipedia category names. Queries are automatically composed from individual categories, then paraphrased and further validated for naturalness and fluency by crowdworkers. Crowdworkers also assess the relevance of entities based on their documents and highlight attribution of query constraints to spans of document text. We analyze several modern retrieval systems, finding that they often struggle on such queries. Queries involving negation and conjunction are particularly challenging and systems are further challenged with combinations of these operations.


Semantic Search using Spreading Activation based on Ontology

Currently, the text document retrieval systems have many challenges in e...

DataFinder: Scientific Dataset Recommendation from Natural Language Descriptions

Modern machine learning relies on datasets to develop and validate resea...

RepBERT: Contextualized Text Embeddings for First-Stage Retrieval

Although exact term match between queries and documents is the dominant ...

Automated Query Learning with Wikipedia and Genetic Programming

Most of the existing information retrieval systems are based on bag of w...

Ranking Archived Documents for Structured Queries on Semantic Layers

Archived collections of documents (like newspaper and web archives) serv...

A Tree Pattern Matching Algorithm for XML Queries with Structural Preferences

In the XML community, exact queries allow users to specify exactly what ...

Evaluating Entity Disambiguation and the Role of Popularity in Retrieval-Based NLP

Retrieval is a core component for open-domain NLP tasks. In open-domain ...

Please sign up or login with your details

Forgot password? Click here to reset