Mixed-modality Representation Learning and Pre-training for Joint Table-and-Text Retrieval in OpenQA

10/11/2022
by   Junjie Huang, et al.
0

Retrieving evidences from tabular and textual resources is essential for open-domain question answering (OpenQA), which provides more comprehensive information. However, training an effective dense table-text retriever is difficult due to the challenges of table-text discrepancy and data sparsity problem. To address the above challenges, we introduce an optimized OpenQA Table-Text Retriever (OTTeR) to jointly retrieve tabular and textual evidences. Firstly, we propose to enhance mixed-modality representation learning via two mechanisms: modality-enhanced representation and mixed-modality negative sampling strategy. Secondly, to alleviate data sparsity problem and enhance the general retrieval ability, we conduct retrieval-centric mixed-modality synthetic pre-training. Experimental results demonstrate that OTTeR substantially improves the performance of table-and-text retrieval on the OTT-QA dataset. Comprehensive analyses examine the effectiveness of all the proposed mechanisms. Besides, equipped with OTTeR, our OpenQA system achieves the state-of-the-art result on the downstream QA task, with 10.1 improvement in terms of the exact match over the previous best system. All the code and data are available at https://github.com/Jun-jie-Huang/OTTeR.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/22/2021

Open Domain Question Answering over Tables via Dense Retrieval

Recent advances in open-domain QA have led to strong models based on den...
research
09/19/2023

Enhancing Open-Domain Table Question Answering via Syntax- and Structure-aware Dense Retrieval

Open-domain table question answering aims to provide answers to a questi...
research
06/25/2023

RobuT: A Systematic Study of Table QA Robustness Against Human-Annotated Adversarial Perturbations

Despite significant progress having been made in question answering on t...
research
09/27/2020

Unsupervised Pre-training for Biomedical Question Answering

We explore the suitability of unsupervised representation learning metho...
research
01/22/2020

ManyModalQA: Modality Disambiguation and QA over Diverse Inputs

We present a new multimodal question answering challenge, ManyModalQA, i...
research
11/01/2021

Enhanced Language Representation with Label Knowledge for Span Extraction

Span extraction, aiming to extract text spans (such as words or phrases)...
research
09/09/2021

Table-based Fact Verification with Salience-aware Learning

Tables provide valuable knowledge that can be used to verify textual sta...

Please sign up or login with your details

Forgot password? Click here to reset