Determining Question-Answer Plausibility in Crowdsourced Datasets Using Multi-Task Learning

by   Rachel Gardner, et al.

Datasets extracted from social networks and online forums are often prone to the pitfalls of natural language, namely the presence of unstructured and noisy data. In this work, we seek to enable the collection of high-quality question-answer datasets from social media by proposing a novel task for automated quality analysis and data cleaning: question-answer (QA) plausibility. Given a machine or user-generated question and a crowd-sourced response from a social media user, we determine if the question and response are valid; if so, we identify the answer within the free-form response. We design BERT-based models to perform the QA plausibility task, and we evaluate the ability of our models to generate a clean, usable question-answer dataset. Our highest-performing approach consists of a single-task model which determines the plausibility of the question, followed by a multi-task model which evaluates the plausibility of the response as well as extracts answers (Question Plausibility AUROC=0.75, Response Plausibility AUROC=0.78, Answer Extraction F1=0.665).


VOGUE: Answer Verbalization through Multi-Task Learning

In recent years, there have been significant developments in Question An...

Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for Offensive Language Detection

Nowadays, offensive content in social media has become a serious problem...

MEDIC: A Multi-Task Learning Dataset for Disaster Image Classification

Recent research in disaster informatics demonstrates a practical and imp...

TWEETQA: A Social Media Focused Question Answering Dataset

With social media becoming increasingly pop-ular on which lots of news a...

How to Build Robust FAQ Chatbot with Controllable Question Generator?

Many unanswerable adversarial questions fool the question-answer (QA) sy...

Predicting the Quality of Short Narratives from Social Media

An important and difficult challenge in building computational models fo...

Want Answers? A Reddit Inspired Study on How to Pose Questions

Questions form an integral part of our everyday communication, both offl...