TravelBERT: Pre-training Language Model Incorporating Domain-specific Heterogeneous Knowledge into A Unified Representation

09/02/2021
by   Hongyin Zhu, et al.
0

Existing technologies expand BERT from different perspectives, e.g. designing different pre-training tasks, different semantic granularities and different model architectures. Few models consider expanding BERT from different text formats. In this paper, we propose a heterogeneous knowledge language model (HKLM), a unified pre-trained language model (PLM) for all forms of text, including unstructured text, semi-structured text and well-structured text. To capture the corresponding relations among these multi-format knowledge, our approach uses masked language model objective to learn word knowledge, uses triple classification objective and title matching objective to learn entity knowledge and topic knowledge respectively. To obtain the aforementioned multi-format text, we construct a corpus in the tourism domain and conduct experiments on 5 tourism NLP datasets. The results show that our approach outperforms the pre-training of plain text using only 1/4 of the data. The code, datasets, corpus and knowledge graph will be released.

READ FULL TEXT
research
08/20/2023

FoodGPT: A Large Language Model in Food Testing Domain with Incremental Pre-training and Knowledge Graph Prompt

Currently, the construction of large language models in specific domains...
research
01/21/2023

Unifying Structure Reasoning and Language Model Pre-training for Complex Reasoning

Recent knowledge enhanced pre-trained language models have shown remarka...
research
10/23/2020

Large Scale Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced Language Model Pre-training

Generating natural sentences from Knowledge Graph (KG) triples, known as...
research
10/22/2020

Text Mining to Identify and Extract Novel Disease Treatments From Unstructured Datasets

Objective: We aim to learn potential novel cures for diseases from unstr...
research
12/08/2020

Incorporating Domain Knowledge To Improve Topic Segmentation Of Long MOOC Lecture Videos

Topical Segmentation poses a great role in reducing search space of the ...
research
01/13/2022

LP-BERT: Multi-task Pre-training Knowledge Graph BERT for Link Prediction

Link prediction plays an significant role in knowledge graph, which is a...
research
03/03/2021

OAG-BERT: Pre-train Heterogeneous Entity-augmented Academic Language Model

To enrich language models with domain knowledge is crucial but difficult...

Please sign up or login with your details

Forgot password? Click here to reset