DeepAI AI Chat
Log In Sign Up

A Unified Knowledge Graph Service for Developing Domain Language Models in AI Software

12/10/2022
by   Ruiqing Ding, et al.
0

Natural Language Processing (NLP) is one of the core techniques in AI software. As AI is being applied to more and more domains, how to efficiently develop high-quality domain-specific language models becomes a critical question in AI software engineering. Existing domain-specific language model development processes mostly focus on learning a domain-specific pre-trained language model (PLM); when training the domain task-specific language model based on PLM, only a direct (and often unsatisfactory) fine-tuning strategy is adopted commonly. By enhancing the task-specific training procedure with domain knowledge graphs, we propose KnowledgeDA, a unified and low-code domain language model development service. Given domain-specific task texts input by a user, KnowledgeDA can automatically generate a domain-specific language model following three steps: (i) localize domain knowledge entities in texts via an embedding-similarity approach; (ii) generate augmented samples by retrieving replaceable domain entity pairs from two views of both knowledge graph and training data; (iii) select high-quality augmented samples for fine-tuning via confidence-based assessment. We implement a prototype of KnowledgeDA to learn language models for two domains, healthcare and software development. Experiments on five domain-specific NLP tasks verify the effectiveness and generalizability of KnowledgeDA. (Code is publicly available at https://github.com/RuiqingDing/KnowledgeDA.)

READ FULL TEXT

page 1

page 5

04/22/2022

KALA: Knowledge-Augmented Language Model Adaptation

Pre-trained language models (PLMs) have achieved remarkable success on v...
12/06/2022

CySecBERT: A Domain-Adapted Language Model for the Cybersecurity Domain

The field of cybersecurity is evolving fast. Experts need to be informed...
03/27/2023

Unified Text Structuralization with Instruction-tuned Language Models

Text structuralization is one of the important fields of natural languag...
08/22/2022

Incorporating Domain Knowledge through Task Augmentation for Front-End JavaScript Code Generation

Code generation aims to generate a code snippet automatically from natur...
06/25/2017

Automated text summarisation and evidence-based medicine: A survey of two domains

The practice of evidence-based medicine (EBM) urges medical practitioner...
02/09/2023

Enhancing E-Commerce Recommendation using Pre-Trained Language Model and Fine-Tuning

Pretrained Language Models (PLM) have been greatly successful on a board...
12/08/2020

Incorporating Domain Knowledge To Improve Topic Segmentation Of Long MOOC Lecture Videos

Topical Segmentation poses a great role in reducing search space of the ...