Multi-Level Knowledge Distillation for Out-of-Distribution Detection in Text

11/21/2022
by   Qianhui Wu, et al.
0

Self-supervised representation learning has proved to be a valuable component for out-of-distribution (OoD) detection with only the texts of in-distribution (ID) examples. These approaches either train a language model from scratch or fine-tune a pre-trained language model using ID examples, and then take perplexity as output by the language model as OoD scores. In this paper, we analyse the complementary characteristics of both OoD detection methods and propose a multi-level knowledge distillation approach to integrate their strengths, while mitigating their limitations. Specifically, we use a fine-tuned model as the teacher to teach a randomly initialized student model on the ID examples. Besides the prediction layer distillation, we present a similarity-based intermediate layer distillation method to facilitate the student's awareness of the information flow inside the teacher's layers. In this way, the derived student model gains the teacher's rich knowledge about the ID data manifold due to pre-training, while benefiting from seeing only ID examples during parameter learning, which promotes more distinguishable features for OoD detection. We conduct extensive experiments over multiple benchmark datasets, i.e., CLINC150, SST, 20 NewsGroups, and AG News; showing that the proposed method yields new state-of-the-art performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/02/2021

One Teacher is Enough? Pre-trained Language Model Distillation from Multiple Teachers

Pre-trained language models (PLMs) achieve great success in NLP. However...
research
11/26/2022

SKDBERT: Compressing BERT via Stochastic Knowledge Distillation

In this paper, we propose Stochastic Knowledge Distillation (SKD) to obt...
research
03/06/2021

Adaptive Multi-Teacher Multi-level Knowledge Distillation

Knowledge distillation (KD) is an effective learning paradigm for improv...
research
11/02/2022

Multi-level Distillation of Semantic Knowledge for Pre-training Multilingual Language Model

Pre-trained multilingual language models play an important role in cross...
research
05/08/2020

Distilling Knowledge from Pre-trained Language Models via Text Smoothing

This paper studies compressing pre-trained language models, like BERT (D...
research
01/15/2020

Uncertainty-Aware Multi-Shot Knowledge Distillation for Image-Based Object Re-Identification

Object re-identification (re-id) aims to identify a specific object acro...
research
10/20/2022

Enhancing Out-of-Distribution Detection in Natural Language Understanding via Implicit Layer Ensemble

Out-of-distribution (OOD) detection aims to discern outliers from the in...

Please sign up or login with your details

Forgot password? Click here to reset