Quality and Cost Trade-offs in Passage Re-ranking Task

11/18/2021
by   Pavel Podberezko, et al.
0

Deep learning models named transformers achieved state-of-the-art results in a vast majority of NLP tasks at the cost of increased computational complexity and high memory consumption. Using the transformer model in real-time inference becomes a major challenge when implemented in production, because it requires expensive computational resources. The more executions of a transformer are needed the lower the overall throughput is, and switching to the smaller encoders leads to the decrease of accuracy. Our paper is devoted to the problem of how to choose the right architecture for the ranking step of the information retrieval pipeline, so that the number of required calls of transformer encoder is minimal with the maximum achievable quality of ranking. We investigated several late-interaction models such as Colbert and Poly-encoder architectures along with their modifications. Also, we took care of the memory footprint of the search index and tried to apply the learning-to-hash method to binarize the output vectors from the transformer encoders. The results of the evaluation are provided using TREC 2019-2021 and MS Marco dev datasets.

READ FULL TEXT
research
03/14/2023

I3D: Transformer architectures with input-dependent dynamic depth for speech recognition

Transformer-based end-to-end speech recognition has achieved great succe...
research
02/26/2020

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Since hardware resources are limited, the objective of training deep lea...
research
11/06/2021

Analyzing Architectures for Neural Machine Translation Using Low Computational Resources

With the recent developments in the field of Natural Language Processing...
research
08/22/2023

TurboViT: Generating Fast Vision Transformers via Generative Architecture Search

Vision transformers have shown unprecedented levels of performance in ta...
research
05/01/2020

Multi-scale Transformer Language Models

We investigate multi-scale transformer language models that learn repres...
research
01/18/2021

Mitigating the Position Bias of Transformer Models in Passage Re-Ranking

Supervised machine learning models and their evaluation strongly depends...
research
10/15/2021

Cascaded Fast and Slow Models for Efficient Semantic Code Search

The goal of natural language semantic code search is to retrieve a seman...

Please sign up or login with your details

Forgot password? Click here to reset