ComQA: A Community-sourced Dataset for Complex Factoid Question Answering with Paraphrase Clusters

09/25/2018
by   Abdalghani Abujabal, et al.
0

To bridge the gap between the capabilities of the state-of-the-art in factoid question answering (QA) and what real users ask, we need large datasets of real user questions that capture the various question phenomena users are interested in, and the diverse ways in which these questions are formulated. We introduce ComQA, a large dataset of real user questions that exhibit different challenging aspects such as temporal reasoning, compositionality, etc. ComQA questions come from the WikiAnswers community QA platform. Through a large crowdsourcing effort, we clean the question dataset, group questions into paraphrase clusters, and annotate clusters with their answers. ComQA contains 11,214 questions grouped into 4,834 paraphrase clusters. We detail the process of constructing ComQA, including the measures taken to ensure its high quality while making effective use of crowdsourcing. We also present an extensive analysis of the dataset and the results achieved by state-of-the-art systems on ComQA, demonstrating that our dataset can be a driver of future research on QA.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/09/2019

TEQUILA: Temporal Question Answering over Knowledge Bases

Question answering over knowledge bases (KB-QA) poses challenges in hand...
research
08/28/2020

A Dataset and Baselines for Visual Question Answering on Art

Answering questions related to art pieces (paintings) is a difficult tas...
research
07/08/2020

KQA Pro: A Large Diagnostic Dataset for Complex Question Answering over Knowledge Base

Complex question answering over knowledge base (Complex KBQA) is challen...
research
11/17/2018

Deep Dive into Anonymity: A Large Scale Analysis of Quora Questions

Anonymity forms an integral and important part of our digital life. It e...
research
01/15/2014

Enhancing QA Systems with Complex Temporal Question Processing Capabilities

This paper presents a multilayered architecture that enhances the capabi...
research
01/17/2017

Community Question Answering Platforms vs. Twitter for Predicting Characteristics of Urban Neighbourhoods

In this paper, we investigate whether text from a Community Question Ans...
research
01/31/2018

Complex Sequential Question Answering: Towards Learning to Converse Over Linked Question Answer Pairs with a Knowledge Graph

While conversing with chatbots, humans typically tend to ask many questi...

Please sign up or login with your details

Forgot password? Click here to reset