DeepAI AI Chat
Log In Sign Up

Robustly Optimized and Distilled Training for Natural Language Understanding

by   Haytham ElFadeel, et al.

In this paper, we explore multi-task learning (MTL) as a second pretraining step to learn enhanced universal language representation for transformer language models. We use the MTL enhanced representation across several natural language understanding tasks to improve performance and generalization. Moreover, we incorporate knowledge distillation (KD) in MTL to further boost performance and devise a KD variant that learns effectively from multiple teachers. By combining MTL and KD, we propose Robustly Optimized and Distilled (ROaD) modeling framework. We use ROaD together with the ELECTRA model to obtain state-of-the-art results for machine reading comprehension and natural language inference.


page 1

page 2

page 3

page 4


Semantics-Aware Inferential Network for Natural Language Understanding

For natural language understanding tasks, either machine reading compreh...

Making Neural Machine Reading Comprehension Faster

This study aims at solving the Machine Reading Comprehension problem whe...

Discourse-Based Evaluation of Language Understanding

We introduce DiscEval, a compilation of 11 evaluation datasets with a fo...

Effect of Vision-and-Language Extensions on Natural Language Understanding in Vision-and-Language Models

Extending language models with structural modifications and vision-and-l...

Knowledge Distillation with Noisy Labels for Natural Language Understanding

Knowledge Distillation (KD) is extensively used to compress and deploy l...

Coarse-to-Fine: Hierarchical Multi-task Learning for Natural Language Understanding

Generalized text representations are the foundation of many natural lang...