MATINF: A Jointly Labeled Large-Scale Dataset for Classification, Question Answering and Summarization

04/26/2020
by   Canwen Xu, et al.
0

Recently, large-scale datasets have vastly facilitated the development in nearly all domains of Natural Language Processing. However, there is currently no cross-task dataset in NLP, which hinders the development of multi-task learning. We propose MATINF, the first jointly labeled large-scale dataset for classification, question answering and summarization. MATINF contains 1.07 million question-answer pairs with human-labeled categories and user-generated question descriptions. Based on such rich information, MATINF is applicable for three major NLP tasks, including classification, question answering, and summarization. We benchmark existing methods and a novel multi-task baseline over MATINF to inspire further research. Our comprehensive comparison and experiments over MATINF and other datasets demonstrate the merits held by MATINF.

READ FULL TEXT

page 1

page 2

page 3

page 4

09/03/2018

emrQA: A Large Corpus for Question Answering on Electronic Medical Records

We propose a novel methodology to generate domain-specific large-scale q...
12/06/2014

Practice in Synonym Extraction at Large Scale

Synonym extraction is an important task in natural language processing a...
09/23/2021

ParaShoot: A Hebrew Question Answering Dataset

NLP research in Hebrew has largely focused on morphology and syntax, whe...
04/26/2022

Science Checker: Extractive-Boolean Question Answering For Scientific Fact Checking

With the explosive growth of scientific publications, making the synthes...
05/23/2022

On Measuring Social Biases in Prompt-Based Multi-Task Learning

Large language models trained on a mixture of NLP tasks that are convert...
04/13/2021

What's in your Head? Emergent Behaviour in Multi-Task Transformer Models

The primary paradigm for multi-task training in natural language process...
07/08/2022

The Harvard USPTO Patent Dataset: A Large-Scale, Well-Structured, and Multi-Purpose Corpus of Patent Applications

Innovation is a major driver of economic and social development, and inf...