Distilling Task-Specific Knowledge from BERT into Simple Neural Networks

03/28/2019
by   Raphael Tang, et al.
0

In the natural language processing literature, neural networks are becoming increasingly deeper and complex. The recent poster child of this trend is the deep language representation model, which includes BERT, ELMo, and GPT. These developments have led to the conviction that previous-generation, shallower neural networks for language understanding are obsolete. In this paper, however, we demonstrate that rudimentary, lightweight neural networks can still be made competitive without architecture changes, external training data, or additional input features. We propose to distill knowledge from BERT, a state-of-the-art language representation model, into a single-layer BiLSTM, as well as its siamese counterpart for sentence-pair tasks. Across multiple datasets in paraphrasing, natural language inference, and sentiment classification, we achieve comparable results with ELMo, while using roughly 100 times fewer parameters and 15 times less inference time.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

page 6

page 7

page 8

research
11/12/2017

Natural Language Inference with External Knowledge

Modeling informal inference in natural language is very challenging. Wit...
research
10/19/2019

MonaLog: a Lightweight System for Natural Language Inference Based on Monotonicity

We present a new logic-based inference engine for natural language infer...
research
07/28/2020

Improving Results on Russian Sentiment Datasets

In this study, we test standard neural network architectures (CNN, LSTM,...
research
07/09/2019

UW-BHI at MEDIQA 2019: An Analysis of Representation Methods for Medical Natural Language Inference

Recent advances in distributed language modeling have led to large perfo...
research
03/31/2023

BERTino: an Italian DistilBERT model

The recent introduction of Transformers language representation models a...
research
05/10/2019

Using syntactical and logical forms to evaluate textual inference competence

In the light of recent breakthroughs in transfer learning for Natural La...
research
02/25/2021

BERT-based Acronym Disambiguation with Multiple Training Strategies

Acronym disambiguation (AD) task aims to find the correct expansions of ...

Please sign up or login with your details

Forgot password? Click here to reset