Query-driven Segment Selection for Ranking Long Documents

09/10/2021
by   Youngwoo Kim, et al.
5

Transformer-based rankers have shown state-of-the-art performance. However, their self-attention operation is mostly unable to process long sequences. One of the common approaches to train these rankers is to heuristically select some segments of each document, such as the first segment, as training data. However, these segments may not contain the query-related parts of documents. To address this problem, we propose query-driven segment selection from long documents to build training data. The segment selector provides relevant samples with more accurate labels and non-relevant samples which are harder to be predicted. The experimental results show that the basic BERT-based ranker trained with the proposed segment selector significantly outperforms that trained by the heuristically selected segments, and performs equally to the state-of-the-art model with localized self-attention that can process longer input sequences. Our findings open up new direction to design efficient transformer-based rankers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/23/2020

Long Document Ranking with Query-Directed Sparse Transformer

The computing cost of transformer self-attention often necessitates brea...
research
09/11/2023

Long-Range Transformer Architectures for Document Understanding

Since their release, Transformers have revolutionized many fields from N...
research
10/11/2022

An Exploration of Hierarchical Attention Transformers for Efficient Long Document Classification

Non-hierarchical sparse attention Transformer-based models, such as Long...
research
02/23/2022

Preformer: Predictive Transformer with Multi-Scale Segment-wise Correlations for Long-Term Time Series Forecasting

Transformer-based methods have shown great potential in long-term time s...
research
05/11/2022

Query-Based Keyphrase Extraction from Long Documents

Transformer-based architectures in natural language processing force inp...
research
05/16/2020

Recurrent Chunking Mechanisms for Long-Text Machine Reading Comprehensio

In this paper, we study machine reading comprehension (MRC) on long text...
research
07/10/2020

BISON:BM25-weighted Self-Attention Framework for Multi-Fields Document Search

Recent breakthrough in natural language processing has advanced the info...

Please sign up or login with your details

Forgot password? Click here to reset