GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

by   Alex Wang, et al.
NYU college
University of Washington

For natural language understanding (NLU) technology to be maximally useful, both practically and as a scientific object of study, it must be general: it must be able to process language in a way that is not exclusively tailored to any one specific task or dataset. In pursuit of this objective, we introduce the General Language Understanding Evaluation benchmark (GLUE), a tool for evaluating and analyzing the performance of models across a diverse range of existing NLU tasks. GLUE is model-agnostic, but it incentivizes sharing knowledge across tasks because certain tasks have very limited training data. We further provide a hand-crafted diagnostic test suite that enables detailed linguistic analysis of NLU models. We evaluate baselines based on current methods for multi-task and transfer learning and find that they do not immediately give substantial improvements over the aggregate performance of training a separate model per task, indicating room for improvement in developing general and robust NLU systems.


page 1

page 2

page 3

page 4


Learning and Evaluating General Linguistic Intelligence

We define general linguistic intelligence as the ability to reuse previo...

GeoGLUE: A GeoGraphic Language Understanding Evaluation Benchmark

With a fast developing pace of geographic applications, automatable and ...

SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

In the last year, new models and methods for pretraining and transfer le...

NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning Tasks

Given the ubiquitous nature of numbers in text, reasoning with numbers t...

PLUE: Language Understanding Evaluation Benchmark for Privacy Policies in English

Privacy policies provide individuals with information about their rights...

Domain Adaptation of Recurrent Neural Networks for Natural Language Understanding

The goal of this paper is to use multi-task learning to efficiently scal...

Please sign up or login with your details

Forgot password? Click here to reset