Topic-Grained Text Representation-based Model for Document Retrieval

07/11/2022
by   Mengxue Du, et al.
0

Document retrieval enables users to find their required documents accurately and quickly. To satisfy the requirement of retrieval efficiency, prevalent deep neural methods adopt a representation-based matching paradigm, which saves online matching time by pre-storing document representations offline. However, the above paradigm consumes vast local storage space, especially when storing the document as word-grained representations. To tackle this, we present TGTR, a Topic-Grained Text Representation-based Model for document retrieval. Following the representation-based matching paradigm, TGTR stores the document representations offline to ensure retrieval efficiency, whereas it significantly reduces the storage requirements by using novel topicgrained representations rather than traditional word-grained. Experimental results demonstrate that compared to word-grained baselines, TGTR is consistently competitive with them on TREC CAR and MS MARCO in terms of retrieval accuracy, but it requires less than 1/10 of the storage space required by them. Moreover, TGTR overwhelmingly surpasses global-grained baselines in terms of retrieval accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/24/2022

Introducing Neural Bag of Whole-Words with ColBERTer: Contextualized Late Interactions using Enhanced Reduction

Recent progress in neural information retrieval has demonstrated large g...
research
07/01/2020

Fused Text Recogniser and Deep Embeddings Improve Word Recognition and Retrieval

Recognition and retrieval of textual content from the large document col...
research
05/18/2023

Advancing Full-Text Search Lemmatization Techniques with Paradigm Retrieval from OpenCorpora

In this paper, we unveil a groundbreaking method to amplify full-text se...
research
05/24/2016

Requirements for storing electrophysiology data

The purpose of this document is to specify the basic data types required...
research
06/20/2023

Representation Sparsification with Hybrid Thresholding for Fast SPLADE-based Document Retrieval

Learned sparse document representations using a transformer-based neural...
research
05/24/2023

Semantic-Enhanced Differentiable Search Index Inspired by Learning Strategies

Recently, a new paradigm called Differentiable Search Index (DSI) has be...
research
06/25/2021

A Modern Perspective on Query Likelihood with Deep Generative Retrieval Models

Existing neural ranking models follow the text matching paradigm, where ...

Please sign up or login with your details

Forgot password? Click here to reset