An Empirical Study of Uniform-Architecture Knowledge Distillation in Document Ranking

02/08/2023
by   Xubo Qin, et al.
0

Although BERT-based ranking models have been commonly used in commercial search engines, they are usually time-consuming for online ranking tasks. Knowledge distillation, which aims at learning a smaller model with comparable performance to a larger model, is a common strategy for reducing the online inference latency. In this paper, we investigate the effect of different loss functions for uniform-architecture distillation of BERT-based ranking models. Here "uniform-architecture" denotes that both teacher and student models are in cross-encoder architecture, while the student models include small-scaled pre-trained language models. Our experimental results reveal that the optimal distillation configuration for ranking tasks is much different than general natural language processing tasks. Specifically, when the student models are in cross-encoder architecture, a pairwise loss of hard labels is critical for training student models, whereas the distillation objectives of intermediate Transformer layers may hurt performance. These findings emphasize the necessity of carefully designing a distillation strategy (for cross-encoder student models) tailored for document ranking with pairwise training samples.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/12/2021

MATE-KD: Masked Adversarial TExt, a Companion to Knowledge Distillation

The advent of large pre-trained language models has given rise to rapid ...
research
10/06/2020

Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation

The latency of neural ranking models at query time is largely dependent ...
research
05/20/2021

Intra-Document Cascading: Learning to Select Passages for Neural Document Ranking

An emerging recipe for achieving state-of-the-art effectiveness in neura...
research
10/13/2021

Language Modelling via Learning to Rank

We consider language modelling (LM) as a multi-label structured predicti...
research
09/23/2021

Distiller: A Systematic Study of Model Distillation Methods in Natural Language Processing

We aim to identify how different components in the KD pipeline affect th...
research
11/11/2022

PILE: Pairwise Iterative Logits Ensemble for Multi-Teacher Labeled Distillation

Pre-trained language models have become a crucial part of ranking system...
research
10/22/2020

Distilling Dense Representations for Ranking using Tightly-Coupled Teachers

We present an approach to ranking with dense representations that applie...

Please sign up or login with your details

Forgot password? Click here to reset