A Unified Knowledge Graph Service for Developing Domain Language Models in AI Software

12/10/2022
by   Ruiqing Ding, et al.
0

Natural Language Processing (NLP) is one of the core techniques in AI software. As AI is being applied to more and more domains, how to efficiently develop high-quality domain-specific language models becomes a critical question in AI software engineering. Existing domain-specific language model development processes mostly focus on learning a domain-specific pre-trained language model (PLM); when training the domain task-specific language model based on PLM, only a direct (and often unsatisfactory) fine-tuning strategy is adopted commonly. By enhancing the task-specific training procedure with domain knowledge graphs, we propose KnowledgeDA, a unified and low-code domain language model development service. Given domain-specific task texts input by a user, KnowledgeDA can automatically generate a domain-specific language model following three steps: (i) localize domain knowledge entities in texts via an embedding-similarity approach; (ii) generate augmented samples by retrieving replaceable domain entity pairs from two views of both knowledge graph and training data; (iii) select high-quality augmented samples for fine-tuning via confidence-based assessment. We implement a prototype of KnowledgeDA to learn language models for two domains, healthcare and software development. Experiments on five domain-specific NLP tasks verify the effectiveness and generalizability of KnowledgeDA. (Code is publicly available at https://github.com/RuiqingDing/KnowledgeDA.)

READ FULL TEXT

page 1

page 5

research
04/22/2022

KALA: Knowledge-Augmented Language Model Adaptation

Pre-trained language models (PLMs) have achieved remarkable success on v...
research
06/19/2023

Fine-tuning Large Enterprise Language Models via Ontological Reasoning

Large Language Models (LLMs) exploit fine-tuning as a technique to adapt...
research
12/06/2022

CySecBERT: A Domain-Adapted Language Model for the Cybersecurity Domain

The field of cybersecurity is evolving fast. Experts need to be informed...
research
08/18/2023

Domain Adaptive Code Completion via Language Models and Decoupled Domain Databases

Large Language Models (LLMs) have demonstrated remarkable performance in...
research
07/28/2023

TrafficSafetyGPT: Tuning a Pre-trained Large Language Model to a Domain-Specific Expert in Transportation Safety

Large Language Models (LLMs) have shown remarkable effectiveness in vari...
research
08/26/2023

Planning with Logical Graph-based Language Model for Instruction Generation

Despite the superior performance of large language models to generate na...
research
07/28/2023

ChatHome: Development and Evaluation of a Domain-Specific Language Model for Home Renovation

This paper presents the development and evaluation of ChatHome, a domain...

Please sign up or login with your details

Forgot password? Click here to reset