Pretrained Transformers for Text Ranking: BERT and Beyond

10/13/2020
by   Jimmy Lin, et al.
0

The goal of text ranking is to generate an ordered list of texts retrieved from a corpus in response to a query. Although the most common formulation of text ranking is search, instances of the task can also be found in many natural language processing applications. This survey provides an overview of text ranking with neural network architectures known as transformers, of which BERT is the best-known example. The combination of transformers and self-supervised pretraining has, without exaggeration, revolutionized the fields of natural language processing (NLP), information retrieval (IR), and beyond. In this survey, we provide a synthesis of existing work as a single point of entry for practitioners who wish to gain a better understanding of how to apply transformers to text ranking problems and researchers who wish to pursue work in this area. We cover a wide range of modern techniques, grouped into two high-level categories: transformer models that perform reranking in multi-stage ranking architectures and learned dense representations that attempt to perform ranking directly. There are two themes that pervade our survey: techniques for handling long documents, beyond the typical sentence-by-sentence processing approaches used in NLP, and techniques for addressing the tradeoff between effectiveness (result quality) and efficiency (query latency). Although transformer architectures and pretraining techniques are recent innovations, many aspects of how they are applied to text ranking are relatively well understood and represent mature techniques. However, there remain many open research questions, and thus in addition to laying out the foundations of pretrained transformers for text ranking, this survey also attempts to prognosticate where the field is heading.

READ FULL TEXT
research
09/14/2020

Efficient Transformers: A Survey

Transformer model architectures have garnered immense interest lately du...
research
02/28/2023

A Survey on Long Text Modeling with Transformers

Modeling long texts has been an essential technique in the field of natu...
research
05/10/2020

Transformer-Based Language Models for Similar Text Retrieval and Ranking

Most approaches for similar text retrieval and ranking with long natural...
research
12/14/2022

Explainability of Text Processing and Retrieval Methods: A Critical Survey

Deep Learning and Machine Learning based models have become extremely po...
research
08/10/2020

GANBERT: Generative Adversarial Networks with Bidirectional Encoder Representations from Transformers for MRI to PET synthesis

Synthesizing medical images, such as PET, is a challenging task due to t...
research
10/04/2021

A Proposed Conceptual Framework for a Representational Approach to Information Retrieval

This paper outlines a conceptual framework for understanding recent deve...
research
01/25/2023

An Experimental Study on Pretraining Transformers from Scratch for IR

Finetuning Pretrained Language Models (PLM) for IR has been de facto the...

Please sign up or login with your details

Forgot password? Click here to reset