ArchivalQA: A Large-scale Benchmark Dataset for Open Domain Question Answering over Archival News Collections

09/08/2021
by   Jiexin Wang, et al.
0

In the last few years, open-domain question answering (ODQA) has advanced rapidly due to the development of deep learning techniques and the availability of large-scale QA datasets. However, the current datasets are essentially designed for synchronic document collections (e.g., Wikipedia). Temporal news collections such as long-term news archives spanning several decades, are rarely used in training the models despite they are quite valuable for our society. In order to foster the research in the field of ODQA on such historical collections, we present ArchivalQA, a large question answering dataset consisting of 1,067,056 question-answer pairs which is designed for temporal news QA. In addition, we create four subparts of our dataset based on the question difficulty levels and the containment of temporal expressions, which we believe could be useful for training or testing ODQA systems characterized by different strengths and abilities. The novel QA dataset-constructing framework that we introduce can be also applied to create datasets over other types of collections.

READ FULL TEXT
research
02/12/2022

Recognition-free Question Answering on Handwritten Document Collections

In recent years, considerable progress has been made in the research are...
research
08/12/2016

When was that made?

In this paper, we explore deep learning methods for estimating when obje...
research
09/09/2018

Transforming Question Answering Datasets Into Natural Language Inference Datasets

Existing datasets for natural language inference (NLI) have propelled re...
research
07/14/2019

TWEETQA: A Social Media Focused Question Answering Dataset

With social media becoming increasingly pop-ular on which lots of news a...
research
10/15/2021

ContraQA: Question Answering under Contradicting Contexts

With a rise in false, inaccurate, and misleading information in propagan...
research
05/09/2023

MAUPQA: Massive Automatically-created Polish Question Answering Dataset

Recently, open-domain question answering systems have begun to rely heav...
research
05/23/2023

BAND: Biomedical Alert News Dataset

Infectious disease outbreaks continue to pose a significant threat to hu...

Please sign up or login with your details

Forgot password? Click here to reset