Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation

10/06/2020
by   Sebastian Hofstätter, et al.
0

The latency of neural ranking models at query time is largely dependent on the architecture and deliberate choices by their designers to trade-off effectiveness for higher efficiency. This focus on low query latency of a rising number of efficient ranking architectures make them feasible for production deployment. In machine learning an increasingly common approach to close the effectiveness gap of more efficient models is to apply knowledge distillation from a large teacher model to a smaller student model. We find that different ranking architectures tend to produce output scores in different magnitudes. Based on this finding, we propose a cross-architecture training procedure with a margin focused loss (Margin-MSE), that adapts knowledge distillation to the varying score output distributions of different BERT and non-BERT ranking architectures. We apply the teachable information as additional fine-grained labels to existing training triples of the MSMARCO-Passage collection. We evaluate our procedure of distilling knowledge from state-of-the-art concatenated BERT models to four different efficient architectures (TK, ColBERT, PreTT, and a BERT CLS dot product model). We show that across our evaluated architectures our Margin-MSE knowledge distillation significantly improves their effectiveness without compromising their efficiency. To benefit the community, we publish the costly teacher-score training files in a ready-to-use package.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/08/2023

An Empirical Study of Uniform-Architecture Knowledge Distillation in Document Ranking

Although BERT-based ranking models have been commonly used in commercial...
research
09/30/2021

Born Again Neural Rankers

We introduce Born Again neural Rankers (BAR) in the Learning to Rank (LT...
research
10/22/2020

Distilling Dense Representations for Ranking using Tightly-Coupled Teachers

We present an approach to ranking with dense representations that applie...
research
05/20/2021

Intra-Document Cascading: Learning to Select Passages for Neural Document Ranking

An emerging recipe for achieving state-of-the-art effectiveness in neura...
research
07/13/2020

Towards practical lipreading with distilled and efficient models

Lipreading has witnessed a lot of progress due to the resurgence of neur...
research
08/29/2023

SpikeBERT: A Language Spikformer Trained with Two-Stage Knowledge Distillation from BERT

Spiking neural networks (SNNs) offer a promising avenue to implement dee...
research
04/08/2020

LadaBERT: Lightweight Adaptation of BERT through Hybrid Model Compression

BERT is a cutting-edge language representation model pre-trained by a la...

Please sign up or login with your details

Forgot password? Click here to reset