Fine-Grained Distillation for Long Document Retrieval

12/20/2022
by   Yucheng Zhou, et al.
0

Long document retrieval aims to fetch query-relevant documents from a large-scale collection, where knowledge distillation has become de facto to improve a retriever by mimicking a heterogeneous yet powerful cross-encoder. However, in contrast to passages or sentences, retrieval on long documents suffers from the scope hypothesis that a long document may cover multiple topics. This maximizes their structure heterogeneity and poses a granular-mismatch issue, leading to an inferior distillation efficacy. In this work, we propose a new learning framework, fine-grained distillation (FGD), for long-document retrievers. While preserving the conventional dense retrieval paradigm, it first produces global-consistent representations crossing different fine granularity and then applies multi-granular aligned distillation merely during training. In experiments, we evaluate our framework on two long-document retrieval benchmarks, which show state-of-the-art performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/23/2023

Coarse-to-Fine Knowledge Selection for Document Grounded Dialogs

Multi-document grounded dialogue systems (DGDS) belong to a class of con...
research
09/16/2020

Simplified TinyBERT: Knowledge Distillation for Document Retrieval

Despite the effectiveness of utilizing BERT for document ranking, the co...
research
05/27/2023

Towards Better Entity Linking with Multi-View Enhanced Distillation

Dense retrieval is widely used for entity linking to retrieve entities f...
research
12/17/2017

Organic Visualization of Document Evolution

Recent availability of data of writing processes at keystroke-granularit...
research
06/24/2021

DnS: Distill-and-Select for Efficient and Accurate Video Indexing and Retrieval

In this paper, we address the problem of high performance and computatio...
research
01/11/2022

Structure with Semantics: Exploiting Document Relations for Retrieval

Retrieving relevant documents from a corpus is typically based on the se...
research
11/11/2019

Knowledge Distillation in Document Retrieval

Complex deep learning models now achieve state of the art performance fo...

Please sign up or login with your details

Forgot password? Click here to reset