DeepAI AI Chat
Log In Sign Up

A Unified Knowledge Graph Service for Developing Domain Language Models in AI Software

by   Ruiqing Ding, et al.

Natural Language Processing (NLP) is one of the core techniques in AI software. As AI is being applied to more and more domains, how to efficiently develop high-quality domain-specific language models becomes a critical question in AI software engineering. Existing domain-specific language model development processes mostly focus on learning a domain-specific pre-trained language model (PLM); when training the domain task-specific language model based on PLM, only a direct (and often unsatisfactory) fine-tuning strategy is adopted commonly. By enhancing the task-specific training procedure with domain knowledge graphs, we propose KnowledgeDA, a unified and low-code domain language model development service. Given domain-specific task texts input by a user, KnowledgeDA can automatically generate a domain-specific language model following three steps: (i) localize domain knowledge entities in texts via an embedding-similarity approach; (ii) generate augmented samples by retrieving replaceable domain entity pairs from two views of both knowledge graph and training data; (iii) select high-quality augmented samples for fine-tuning via confidence-based assessment. We implement a prototype of KnowledgeDA to learn language models for two domains, healthcare and software development. Experiments on five domain-specific NLP tasks verify the effectiveness and generalizability of KnowledgeDA. (Code is publicly available at


page 1

page 5


KALA: Knowledge-Augmented Language Model Adaptation

Pre-trained language models (PLMs) have achieved remarkable success on v...

CySecBERT: A Domain-Adapted Language Model for the Cybersecurity Domain

The field of cybersecurity is evolving fast. Experts need to be informed...

Unified Text Structuralization with Instruction-tuned Language Models

Text structuralization is one of the important fields of natural languag...

Incorporating Domain Knowledge through Task Augmentation for Front-End JavaScript Code Generation

Code generation aims to generate a code snippet automatically from natur...

Automated text summarisation and evidence-based medicine: A survey of two domains

The practice of evidence-based medicine (EBM) urges medical practitioner...

Enhancing E-Commerce Recommendation using Pre-Trained Language Model and Fine-Tuning

Pretrained Language Models (PLM) have been greatly successful on a board...

Incorporating Domain Knowledge To Improve Topic Segmentation Of Long MOOC Lecture Videos

Topical Segmentation poses a great role in reducing search space of the ...