Revealing the Importance of Semantic Retrieval for Machine Reading at Scale

09/17/2019
by   Yixin Nie, et al.
0

Machine Reading at Scale (MRS) is a challenging task in which a system is given an input query and is asked to produce a precise output by "reading" information from a large knowledge base. The task has gained popularity with its natural combination of information retrieval (IR) and machine comprehension (MC). Advancements in representation learning have led to separated progress in both IR and MC; however, very few studies have examined the relationship and combined design of retrieval and comprehension at different levels of granularity, for development of MRS systems. In this work, we give general guidelines on system design for MRS by proposing a simple yet effective pipeline system with special consideration on hierarchical semantic retrieval at both paragraph and sentence level, and their potential effects on the downstream task. The system is evaluated on both fact verification and open-domain multihop QA, achieving state-of-the-art results on the leaderboard test sets of both FEVER and HOTPOTQA. To further demonstrate the importance of semantic retrieval, we present ablation and analysis studies to quantify the contribution of neural retrieval modules at both paragraph-level and sentence-level, and illustrate that intermediate semantic retrieval modules are vital for not only effectively filtering upstream information and thus saving downstream computation, but also for shaping upstream data distribution and providing better data for downstream modeling. Code/data made publicly available at: https://github.com/easonnie/semanticRetrievalMRS

READ FULL TEXT

page 1

page 2

page 4

page 5

page 6

page 7

page 9

page 10

research
08/31/2018

Retrieve-and-Read: Multi-task Learning of Information Retrieval and Reading Comprehension

This study considers the task of machine reading at scale (MRS) wherein,...
research
01/30/2021

OpenMatch: An Open-Source Package for Information Retrieval

Information Retrieval (IR) is an important task and can be used in many ...
research
05/07/2022

Better Retrieval May Not Lead to Better Question Answering

Considerable progress has been made recently in open-domain question ans...
research
06/08/2023

RETA-LLM: A Retrieval-Augmented Large Language Model Toolkit

Although Large Language Models (LLMs) have demonstrated extraordinary ca...
research
07/02/2023

BioCPT: Contrastive Pre-trained Transformers with Large-scale PubMed Search Logs for Zero-shot Biomedical Information Retrieval

Information retrieval (IR) is essential in biomedical knowledge acquisit...
research
12/02/2022

Moving Beyond Downstream Task Accuracy for Information Retrieval Benchmarking

Neural information retrieval (IR) systems have progressed rapidly in rec...
research
03/12/2017

Feature overwriting as a finite mixture process: Evidence from comprehension data

The ungrammatical sentence "The key to the cabinets are on the table" is...

Please sign up or login with your details

Forgot password? Click here to reset