DeepAI AI Chat
Log In Sign Up

Robustly Optimized and Distilled Training for Natural Language Understanding

03/16/2021
by   Haytham ElFadeel, et al.
0

In this paper, we explore multi-task learning (MTL) as a second pretraining step to learn enhanced universal language representation for transformer language models. We use the MTL enhanced representation across several natural language understanding tasks to improve performance and generalization. Moreover, we incorporate knowledge distillation (KD) in MTL to further boost performance and devise a KD variant that learns effectively from multiple teachers. By combining MTL and KD, we propose Robustly Optimized and Distilled (ROaD) modeling framework. We use ROaD together with the ELECTRA model to obtain state-of-the-art results for machine reading comprehension and natural language inference.

READ FULL TEXT

page 1

page 2

page 3

page 4

04/28/2020

Semantics-Aware Inferential Network for Natural Language Understanding

For natural language understanding tasks, either machine reading compreh...
03/29/2019

Making Neural Machine Reading Comprehension Faster

This study aims at solving the Machine Reading Comprehension problem whe...
07/19/2019

Discourse-Based Evaluation of Language Understanding

We introduce DiscEval, a compilation of 11 evaluation datasets with a fo...
04/16/2021

Effect of Vision-and-Language Extensions on Natural Language Understanding in Vision-and-Language Models

Extending language models with structural modifications and vision-and-l...
09/21/2021

Knowledge Distillation with Noisy Labels for Natural Language Understanding

Knowledge Distillation (KD) is extensively used to compress and deploy l...
08/19/2022

Coarse-to-Fine: Hierarchical Multi-task Learning for Natural Language Understanding

Generalized text representations are the foundation of many natural lang...