RPT: Toward Transferable Model on Heterogeneous Researcher Data via Pre-Training

by   Ziyue Qiao, et al.

With the growth of the academic engines, the mining and analysis acquisition of massive researcher data, such as collaborator recommendation and researcher retrieval, has become indispensable. It can improve the quality of services and intelligence of academic engines. Most of the existing studies for researcher data mining focus on a single task for a particular application scenario and learning a task-specific model, which is usually unable to transfer to out-of-scope tasks. The pre-training technology provides a generalized and sharing model to capture valuable information from enormous unlabeled data. The model can accomplish multiple downstream tasks via a few fine-tuning steps. In this paper, we propose a multi-task self-supervised learning-based researcher data pre-training model named RPT. Specifically, we divide the researchers' data into semantic document sets and community graph. We design the hierarchical Transformer and the local community encoder to capture information from the two categories of data, respectively. Then, we propose three self-supervised learning objectives to train the whole model. Finally, we also propose two transfer modes of RPT for fine-tuning in different scenarios. We conduct extensive experiments to evaluate RPT, results on three downstream tasks verify the effectiveness of pre-training for researcher data mining.


SGL-PT: A Strong Graph Learner with Graph Prompt Tuning

Recently, much exertion has been paid to design graph self-supervised me...

GraphPrompt: Unifying Pre-Training and Downstream Tasks for Graph Neural Networks

Graphs can model complex relationships between objects, enabling a myria...

PASTA: Pretrained Action-State Transformer Agents

Self-supervised learning has brought about a revolutionary paradigm shif...

Process-BERT: A Framework for Representation Learning on Educational Process Data

Educational process data, i.e., logs of detailed student activities in c...

Token Boosting for Robust Self-Supervised Visual Transformer Pre-training

Learning with large-scale unlabeled data has become a powerful tool for ...

GraphCode2Vec: Generic Code Embedding via Lexical and Program Dependence Analyses

Code embedding is a keystone in the application of machine learning on s...

COAD: Contrastive Pre-training with Adversarial Fine-tuning for Zero-shot Expert Linking

Expert finding, a popular service provided by many online websites such ...

Please sign up or login with your details

Forgot password? Click here to reset