Local Self-Attention over Long Text for Efficient Document Retrieval

05/11/2020
by   Sebastian Hofstätter, et al.
0

Neural networks, particularly Transformer-based architectures, have achieved significant performance improvements on several retrieval benchmarks. When the items being retrieved are documents, the time and memory cost of employing Transformers over a full sequence of document terms can be prohibitive. A popular strategy involves considering only the first n terms of the document. This can, however, result in a biased system that under retrieves longer documents. In this work, we propose a local self-attention which considers a moving window over the document terms and for each term attends only to other terms in the same window. This local attention incurs a fraction of the compute and memory cost of attention over the whole document. The windowed approach also leads to more compact packing of padded documents in minibatches resulting in additional savings. We also employ a learned saturation function and a two-staged pooling strategy to identify relevant regions of the document. The Transformer-Kernel pooling model with these changes can efficiently elicit relevance information from documents with thousands of tokens. We benchmark our proposed modifications on the document ranking task from the TREC 2019 Deep Learning track and observe significant improvements in retrieval quality as well as increased retrieval of longer documents at moderate increase in compute and memory costs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/19/2021

Improving Transformer-Kernel Ranking Model Using Conformer and Query Term Independence

The Transformer-Kernel (TK) model has demonstrated strong reranking perf...
research
09/11/2023

Long-Range Transformer Architectures for Document Understanding

Since their release, Transformers have revolutionized many fields from N...
research
04/14/2022

Revisiting Transformer-based Models for Long Document Classification

The recent literature in text classification is biased towards short tex...
research
10/23/2020

Long Document Ranking with Query-Directed Sparse Transformer

The computing cost of transformer self-attention often necessitates brea...
research
07/20/2020

Conformer-Kernel with Query Term Independence for Document Retrieval

The Transformer-Kernel (TK) model has demonstrated strong reranking perf...
research
05/29/2023

Adapting Learned Sparse Retrieval for Long Documents

Learned sparse retrieval (LSR) is a family of neural retrieval methods t...
research
08/08/2021

PoolRank: Max/Min Pooling-based Ranking Loss for Listwise Learning Ranking Balance

Numerous neural retrieval models have been proposed in recent years. The...

Please sign up or login with your details

Forgot password? Click here to reset