GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

04/20/2018
by   Alex Wang, et al.
0

For natural language understanding (NLU) technology to be maximally useful, both practically and as a scientific object of study, it must be general: it must be able to process language in a way that is not exclusively tailored to any one specific task or dataset. In pursuit of this objective, we introduce the General Language Understanding Evaluation benchmark (GLUE), a tool for evaluating and analyzing the performance of models across a diverse range of existing NLU tasks. GLUE is model-agnostic, but it incentivizes sharing knowledge across tasks because certain tasks have very limited training data. We further provide a hand-crafted diagnostic test suite that enables detailed linguistic analysis of NLU models. We evaluate baselines based on current methods for multi-task and transfer learning and find that they do not immediately give substantial improvements over the aggregate performance of training a separate model per task, indicating room for improvement in developing general and robust NLU systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/11/2019

DoubleTransfer at MEDIQA 2019: Multi-Source Transfer Learning for Natural Language Understanding in the Medical Domain

This paper describes our competing system to enter the MEDIQA-2019 compe...
research
01/31/2019

Learning and Evaluating General Linguistic Intelligence

We define general linguistic intelligence as the ability to reuse previo...
research
05/11/2023

GeoGLUE: A GeoGraphic Language Understanding Evaluation Benchmark

With a fast developing pace of geographic applications, automatable and ...
research
05/02/2019

SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

In the last year, new models and methods for pretraining and transfer le...
research
04/12/2022

NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning Tasks

Given the ubiquitous nature of numbers in text, reasoning with numbers t...
research
12/20/2022

PLUE: Language Understanding Evaluation Benchmark for Privacy Policies in English

Privacy policies provide individuals with information about their rights...
research
04/01/2016

Domain Adaptation of Recurrent Neural Networks for Natural Language Understanding

The goal of this paper is to use multi-task learning to efficiently scal...

Please sign up or login with your details

Forgot password? Click here to reset