QUERT: Continual Pre-training of Language Model for Query Understanding in Travel Domain Search

06/11/2023
by   Jian Xie, et al.
0

In light of the success of the pre-trained language models (PLMs), continual pre-training of generic PLMs has been the paradigm of domain adaption. In this paper, we propose QUERT, A Continual Pre-trained Language Model for QUERy Understanding in Travel Domain Search. QUERT is jointly trained on four tailored pre-training tasks to the characteristics of query in travel domain search: Geography-aware Mask Prediction, Geohash Code Prediction, User Click Behavior Learning, and Phrase and Token Order Prediction. Performance improvement of downstream tasks and ablation experiment demonstrate the effectiveness of our proposed pre-training tasks. To be specific, the average performance of downstream tasks increases by 2.02 unsupervised settings, respectively. To check on the improvement of QUERT to online business, we deploy QUERT and perform A/B testing on Fliggy APP. The feedback results show that QUERT increases the Unique Click-Through Rate and Page Click-Through Rate by 0.89 Our code and downstream task data will be released for future research.

READ FULL TEXT
research
08/08/2023

Continual Pre-Training of Large Language Models: How to (re)warm your model?

Large language models (LLMs) are routinely pre-trained on billions of to...
research
10/04/2020

PTUM: Pre-training User Model from Unlabeled User Behaviors via Self-supervision

User modeling is critical for many personalized web services. Many exist...
research
10/23/2022

Language Model Pre-Training with Sparse Latent Typing

Modern large-scale Pre-trained Language Models (PLMs) have achieved trem...
research
11/01/2022

VarMAE: Pre-training of Variational Masked Autoencoder for Domain-adaptive Language Understanding

Pre-trained language models have achieved promising performance on gener...
research
08/23/2022

Learning Better Masking for Better Language Model Pre-training

Masked Language Modeling (MLM) has been widely used as the denoising obj...
research
12/22/2020

ActionBert: Leveraging User Actions for Semantic Understanding of User Interfaces

As mobile devices are becoming ubiquitous, regularly interacting with a ...
research
08/05/2023

PromptCARE: Prompt Copyright Protection by Watermark Injection and Verification

Large language models (LLMs) have witnessed a meteoric rise in popularit...

Please sign up or login with your details

Forgot password? Click here to reset