Improving Bi-encoder Document Ranking Models with Two Rankers and Multi-teacher Distillation

03/11/2021
by   Jaekeol Choi, et al.
0

BERT-based Neural Ranking Models (NRMs) can be classified according to how the query and document are encoded through BERT's self-attention layers - bi-encoder versus cross-encoder. Bi-encoder models are highly efficient because all the documents can be pre-processed before the query time, but their performance is inferior compared to cross-encoder models. Both models utilize a ranker that receives BERT representations as the input and generates a relevance score as the output. In this work, we propose a method where multi-teacher distillation is applied to a cross-encoder NRM and a bi-encoder NRM to produce a bi-encoder NRM with two rankers. The resulting student bi-encoder achieves an improved performance by simultaneously learning from a cross-encoder teacher and a bi-encoder teacher and also by combining relevance scores from the two rankers. We call this method TRMD (Two Rankers and Multi-teacher Distillation). In the experiments, TwinBERT and ColBERT are considered as baseline bi-encoders. When monoBERT is used as the cross-encoder teacher, together with either TwinBERT or ColBERT as the bi-encoder teacher, TRMD produces a student bi-encoder that performs better than the corresponding baseline bi-encoder. For P@20, the maximum improvement was 11.4 average improvement was 6.8 producing cross-encoder students with TRMD, and found that it could also improve the cross-encoders.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/28/2021

Semi-Siamese Bi-encoder Neural Ranking Model Using Lightweight Fine-Tuning

A BERT-based Neural Ranking Model (NRM) can be either a cross-encoder or...
research
09/27/2021

Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations

In NLP, a large volume of tasks involve pairwise comparison between two ...
research
09/04/2023

Geo-Encoder: A Chunk-Argument Bi-Encoder Framework for Chinese Geographic Re-Ranking

Chinese geographic re-ranking task aims to find the most relevant addres...
research
10/22/2020

Distilling Dense Representations for Ranking using Tightly-Coupled Teachers

We present an approach to ranking with dense representations that applie...
research
01/23/2023

Injecting the BM25 Score as Text Improves BERT-Based Re-rankers

In this paper we propose a novel approach for combining first-stage lexi...
research
06/05/2023

Query Encoder Distillation via Embedding Alignment is a Strong Baseline Method to Boost Dense Retriever Online Efficiency

The information retrieval community has made significant progress in imp...
research
05/21/2023

Bi-ViT: Pushing the Limit of Vision Transformer Quantization

Vision transformers (ViTs) quantization offers a promising prospect to f...

Please sign up or login with your details

Forgot password? Click here to reset