Cross-Domain Deep Code Search with Few-Shot Meta Learning

01/01/2022
by   Yitian Chai, et al.
0

Recently, pre-trained programming language models such as CodeBERT have demonstrated substantial gains in code search. Despite their success, they rely on the availability of large amounts of parallel data to fine-tune the semantic mappings between queries and code. This restricts their practicality in domain-specific languages with relatively scarce and expensive data. In this paper, we propose CDCS, a novel approach for domain-specific code search. CDCS employs a transfer learning framework where an initial program representation model is pre-trained on a large corpus of common programming languages (such as Java and Python), and is further adapted to domain-specific languages such as SQL and Solidity. Unlike cross-language CodeBERT, which is directly fine-tuned in the target language, CDCS adapts a few-shot meta-learning algorithm called MAML to learn the good initialization of model parameters, which can be best reused in a domain-specific language. We evaluate the proposed approach on two domain-specific languages, namely, SQL and Solidity, with model transferred from two widely used languages (Python and Java). Experimental results show that CDCS significantly outperforms conventional pre-trained code models that are directly fine-tuned in domain-specific languages, and it is particularly effective for scarce data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/18/2022

Zero-Shot Program Representation Learning

Learning program representations has been the core prerequisite of code ...
research
08/26/2021

AVATAR: A Parallel Corpus for Java-Python Program Translation

Program translation refers to migrating source code from one programming...
research
04/15/2022

Evaluating few shot and Contrastive learning Methods for Code Clone Detection

Context: Code Clone Detection (CCD) is a software engineering task that ...
research
04/05/2022

On the Transferability of Pre-trained Language Models for Low-Resource Programming Languages

A recent study by Ahmed and Devanbu reported that using a corpus of code...
research
04/26/2023

Neuro-symbolic Zero-Shot Code Cloning with Cross-Language Intermediate Representation

In this paper, we define a neuro-symbolic approach to address the task o...
research
08/15/2021

Maps Search Misspelling Detection Leveraging Domain-Augmented Contextual Representations

Building an independent misspelling detector and serve it before correct...
research
12/07/2022

Towards using Few-Shot Prompt Learning for Automating Model Completion

We propose a simple yet a novel approach to improve completion in domain...

Please sign up or login with your details

Forgot password? Click here to reset