Weakly Supervised Pre-Training for Multi-Hop Retriever

06/18/2021
by   Yeon Seonwoo, et al.
8

In multi-hop QA, answering complex questions entails iterative document retrieval for finding the missing entity of the question. The main steps of this process are sub-question detection, document retrieval for the sub-question, and generation of a new query for the final document retrieval. However, building a dataset that contains complex questions with sub-questions and their corresponding documents requires costly human annotation. To address the issue, we propose a new method for weakly supervised multi-hop retriever pre-training without human efforts. Our method includes 1) a pre-training task for generating vector representations of complex questions, 2) a scalable data generation method that produces the nested structure of question and sub-question as weak supervision for pre-training, and 3) a pre-training model structure based on dense encoders. We conduct experiments to compare the performance of our pre-trained retriever with several state-of-the-art models on end-to-end multi-hop QA as well as document retrieval. The experimental results show that our pre-trained retriever is effective and also robust on limited data and computational resources.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/12/2023

Performance Prediction for Multi-hop Questions

We study the problem of Query Performance Prediction (QPP) for open-doma...
research
02/23/2020

Do Multi-Hop Question Answering Systems Know How to Answer the Single-Hop Sub-Questions?

Multi-hop question answering (QA) requires a model to retrieve and integ...
research
06/15/2021

Analysing Dense Passage Retrieval for Multi-hop Question Answering

We analyse the performance of passage retrieval models in the presence o...
research
05/24/2023

Peek Across: Improving Multi-Document Modeling via Cross-Document Question-Answering

The integration of multi-document pre-training objectives into language ...
research
06/21/2022

Questions Are All You Need to Train a Dense Passage Retriever

We introduce ART, a new corpus-level autoencoding approach for training ...
research
09/10/2021

ReasonBERT: Pre-trained to Reason with Distant Supervision

We present ReasonBert, a pre-training method that augments language mode...
research
07/27/2023

Exploring Annotation-free Image Captioning with Retrieval-augmented Pseudo Sentence Generation

Training an image captioner without annotated image-sentence pairs has g...

Please sign up or login with your details

Forgot password? Click here to reset