BISON:BM25-weighted Self-Attention Framework for Multi-Fields Document Search

by   Xuan Shan, et al.

Recent breakthrough in natural language processing has advanced the information retrieval from keyword match to semantic vector search. To map query and documents into semantic vectors, self-attention models are being widely used. However, typical self-attention models, like Transformer, lack prior knowledge to distinguish the importance of different tokens, which has been proved to play a critical role in information retrieval tasks. In addition to this, when applying WordPiece tokenization, a rare word may be split into several different tokens. How to translate word-level prior knowledge into WordPiece tokens becomes a new challenge for the semantic representation generation. Moreover, web documents usually have multiple fields. Due to the heterogeneity of different fields, simple combination is not a good choice. In this paper, We propose a novel BM25-weighted Self-Attention framework (BISON) for web document search. By leveraging BM25 as prior weights, BISON learns weighted attention scores jointly with query matrix Q and key matrix K. We also present an efficient whole word weight sharing solution to mitigate prior knowledge discrepancy between words and WordPiece tokens. Furthermore, BISON effectively combines multiple fields by placing different fields into different segments. We demonstrate BISON is more efficient to capture the topical and semantic representation both in query and document. Intrinsic evaluation and experiments conducted on public data sets reveal BISON to be a general framework for document ranking task. It outperforms BERT and other modern models while retaining the same model complexity with BERT.


page 1

page 2

page 3

page 4


The Power of Selecting Key Blocks with Local Pre-ranking for Long Document Information Retrieval

On a wide range of natural language processing and information retrieval...

Beyond 512 Tokens: Siamese Multi-depth Transformer-based Hierarchical Encoder for Document Matching

Many information retrieval and natural language processing problems can ...

Long Document Ranking with Query-Directed Sparse Transformer

The computing cost of transformer self-attention often necessitates brea...

DA-Transformer: Distance-aware Transformer

Transformer has achieved great success in the NLP field by composing var...

Attention Module is Not Only a Weight: Analyzing Transformers with Vector Norms

Because attention modules are core components of Transformer-based model...

Query-driven Segment Selection for Ranking Long Documents

Transformer-based rankers have shown state-of-the-art performance. Howev...

LabelPrompt: Effective Prompt-based Learning for Relation Classification

Recently, prompt-based learning has become a very popular solution in ma...

Please sign up or login with your details

Forgot password? Click here to reset