Domain Adaptive Code Completion via Language Models and Decoupled Domain Databases

08/18/2023
by   Ze Tang, et al.
0

Large Language Models (LLMs) have demonstrated remarkable performance in code completion. However, due to the lack of domain-specific knowledge, they may not be optimal in completing code that requires intensive domain knowledge for example completing the library names. Although there are several works that have confirmed the effectiveness of fine-tuning techniques to adapt language models for code completion in specific domains. They are limited by the need for constant fine-tuning of the model when the project is in constant iteration. To address this limitation, in this paper, we propose kNM-LM, a retrieval-augmented language model (R-LM), that integrates domain knowledge into language models without fine-tuning. Different from previous techniques, our approach is able to automatically adapt to different language models and domains. Specifically, it utilizes the in-domain code to build the retrieval-based database decoupled from LM, and then combines it with LM through Bayesian inference to complete the code. The extensive experiments on the completion of intra-project and intra-scenario have confirmed that kNM-LM brings about appreciable enhancements when compared to CodeGPT and UnixCoder. A deep analysis of our tool including the responding speed, storage usage, specific type code completion, and API invocation completion has confirmed that kNM-LM provides satisfactory performance, which renders it highly appropriate for domain adaptive code completion. Furthermore, our approach operates without the requirement for direct access to the language model's parameters. As a result, it can seamlessly integrate with black-box code completion models, making it easy to integrate our approach as a plugin to further enhance the performance of these models.

READ FULL TEXT

page 1

page 3

page 8

page 9

page 10

research
02/21/2023

kNN-Adapter: Efficient Domain Adaptation for Black-Box Language Models

Fine-tuning a language model on a new domain is standard practice for do...
research
12/10/2022

A Unified Knowledge Graph Service for Developing Domain Language Models in AI Software

Natural Language Processing (NLP) is one of the core techniques in AI so...
research
03/16/2023

Exploring Distributional Shifts in Large Language Models for Code Analysis

We systematically study the capacity of two large language models for co...
research
12/07/2022

Towards using Few-Shot Prompt Learning for Automating Model Completion

We propose a simple yet a novel approach to improve completion in domain...
research
08/28/2023

CodeMark: Imperceptible Watermarking for Code Datasets against Neural Code Completion Models

Code datasets are of immense value for training neural-network-based cod...
research
06/17/2021

Scientific Language Models for Biomedical Knowledge Base Completion: An Empirical Study

Biomedical knowledge graphs (KGs) hold rich information on entities such...
research
12/06/2022

CySecBERT: A Domain-Adapted Language Model for the Cybersecurity Domain

The field of cybersecurity is evolving fast. Experts need to be informed...

Please sign up or login with your details

Forgot password? Click here to reset