How to Ask Better Questions? A Large-Scale Multi-Domain Dataset for Rewriting Ill-Formed Questions

11/21/2019
by   Zewei Chu, et al.
0

We present a large-scale dataset for the task of rewriting an ill-formed natural language question to a well-formed one. Our multi-domain question rewriting MQR dataset is constructed from human contributed Stack Exchange question edit histories. The dataset contains 427,719 question pairs which come from 303 domains. We provide human annotations for a subset of the dataset as a quality estimate. When moving from ill-formed to well-formed questions, the question quality improves by an average of 45 points across three aspects. We train sequence-to-sequence neural models on the constructed dataset and obtain an improvement of 13.2 resources. We release the MQR dataset to encourage research on the problem of question rewriting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/28/2018

Identifying Well-formed Natural Language Questions

Understanding search queries is a hard problem as it involves dealing wi...
research
08/13/2019

Generative Question Refinement with Deep Reinforcement Learning in Retrieval-based QA System

In real-world question-answering (QA) systems, ill-formed questions, suc...
research
01/31/2020

Break It Down: A Question Understanding Benchmark

Understanding natural language questions entails the ability to break do...
research
05/25/2018

A Study of Question Effectiveness Using Reddit "Ask Me Anything" Threads

Asking effective questions is a powerful social skill. In this paper we ...
research
01/07/2021

Applying Transfer Learning for Improving Domain-Specific Search Experience Using Query to Question Similarity

Search is one of the most common platforms used to seek information. How...
research
05/03/2019

Question Relatedness on Stack Overflow: The Task, Dataset, and Corpus-inspired Models

Domain-specific community question answering is becoming an integral par...
research
07/02/2017

Classification non supervisée des données hétérogènes à large échelle

When it comes to cluster massive data, response time, disk access and qu...

Please sign up or login with your details

Forgot password? Click here to reset