Beyond Lexical: A Semantic Retrieval Framework for Textual SearchEngine

by   Kuan Fang, et al.

Search engine has become a fundamental component in various web and mobile applications. Retrieving relevant documents from the massive datasets is challenging for a search engine system, especially when faced with verbose or tail queries. In this paper, we explore a vector space search framework for document retrieval. Specifically, we trained a deep semantic matching model so that each query and document can be encoded as a low dimensional embedding. Our model was trained based on BERT architecture. We deployed a fast k-nearest-neighbor index service for online serving. Both offline and online metrics demonstrate that our method improved retrieval performance and search quality considerably, particularly for tail


page 1

page 2

page 3

page 4


MIRA: Leveraging Multi-Intention Co-click Information in Web-scale Document Retrieval using Deep Neural Networks

We study the problem of deep recall model in industrial web search, whic...

Pre-trained Language Model for Web-scale Retrieval in Baidu Search

Retrieval is a crucial stage in web search that identifies a small set o...

Learning From Weights: A Cost-Sensitive Approach For Ad Retrieval

Retrieval models such as CLSM is trained on click-through data which tre...

SearchGCN: Powering Embedding Retrieval by Graph Convolution Networks for E-Commerce Search

Graph convolution networks (GCN), which recently becomes new state-of-th...

Cost-sensitive Learning of Deep Semantic Models for Sponsored Ad Retrieval

This paper formulates the problem of learning a neural semantic model fo...

Semantic Search in Millions of Equations

Given the increase of publications, search for relevant papers becomes t...

A Capsule Network-based Embedding Model for Search Personalization

Search personalization aims to tailor search results to each specific us...