Pre-trained Language Model for Web-scale Retrieval in Baidu Search

06/07/2021
by   Yiding Liu, et al.
0

Retrieval is a crucial stage in web search that identifies a small set of query-relevant candidates from a billion-scale corpus. Discovering more semantically-related candidates in the retrieval stage is very promising to expose more high-quality results to the end users. However, it still remains non-trivial challenges of building and deploying effective retrieval models for semantic matching in real search engine. In this paper, we describe the retrieval system that we developed and deployed in Baidu Search. The system exploits the recent state-of-the-art Chinese pretrained language model, namely Enhanced Representation through kNowledge IntEgration (ERNIE), which facilitates the system with expressive semantic matching. In particular, we developed an ERNIE-based retrieval model, which is equipped with 1) expressive Transformer-based semantic encoders, and 2) a comprehensive multi-stage training paradigm. More importantly, we present a practical system workflow for deploying the model in web-scale retrieval. Eventually, the system is fully deployed into production, where rigorous offline and online experiments were conducted. The results show that the system can perform high-quality candidate retrieval, especially for those tail queries with uncommon demands. Overall, the new retrieval system facilitated by pretrained language model (i.e., ERNIE) can largely improve the usability and applicability of our search engine.

READ FULL TEXT

page 5

page 6

research
08/10/2020

Beyond Lexical: A Semantic Retrieval Framework for Textual SearchEngine

Search engine has become a fundamental component in various web and mobi...
research
05/24/2021

Pre-trained Language Model based Ranking in Baidu Search

As the heart of a search engine, the ranking system plays a crucial role...
research
06/02/2023

Pretrained Language Model based Web Search Ranking: From Relevance to Satisfaction

Search engine plays a crucial role in satisfying users' diverse informat...
research
02/18/2018

Recurrent Binary Embedding for GPU-Enabled Exhaustive Retrieval from Billion-Scale Semantic Vectors

Rapid advances in GPU hardware and multiple areas of Deep Learning open ...
research
12/03/2021

Siamese BERT-based Model for Web Search Relevance Ranking Evaluated on a New Czech Dataset

Web search engines focus on serving highly relevant results within hundr...
research
10/20/2020

CoRT: Complementary Rankings from Transformers

Recent approaches towards passage retrieval have successfully employed r...
research
10/15/2021

Cascaded Fast and Slow Models for Efficient Semantic Code Search

The goal of natural language semantic code search is to retrieve a seman...

Please sign up or login with your details

Forgot password? Click here to reset