Evaluating few shot and Contrastive learning Methods for Code Clone Detection

04/15/2022
by   Mohamad Khajezade, et al.
0

Context: Code Clone Detection (CCD) is a software engineering task that is used for plagiarism detection, code search, and code comprehension. Recently, deep learning-based models have achieved an F1 score (a metric used to assess classifiers) of ∼95% on the CodeXGLUE benchmark. These models require many training data, mainly fine-tuned on Java or C++ datasets. However, no previous study evaluates the generalizability of these models where a limited amount of annotated data is available. Objective: The main objective of this research is to assess the ability of the CCD models as well as few shot learning algorithms for unseen programming problems and new languages (i.e., the model is not trained on these problems/languages). Method: We assess the generalizability of the state of the art models for CCD in few shot settings (i.e., only a few samples are available for fine-tuning) by setting three scenarios: i) unseen problems, ii) unseen languages, iii) combination of new languages and new problems. We choose three datasets of BigCloneBench, POJ-104, and CodeNet and Java, C++, and Ruby languages. Then, we employ Model Agnostic Meta-learning (MAML), where the model learns a meta-learner capable of extracting transferable knowledge from the train set; so that the model can be fine-tuned using a few samples. Finally, we combine contrastive learning with MAML to further study whether it can improve the results of MAML.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/21/2020

Cross-Domain Few-Shot Learning with Meta Fine-Tuning

In this paper, we tackle the new Cross-Domain Few-Shot Learning benchmar...
research
01/01/2022

Cross-Domain Deep Code Search with Few-Shot Meta Learning

Recently, pre-trained programming language models such as CodeBERT have ...
research
07/26/2018

Meta-learning autoencoders for few-shot prediction

Compared to humans, machine learning models generally require significan...
research
08/26/2022

Generalizability of Code Clone Detection on CodeBERT

Transformer networks such as CodeBERT already achieve outstanding result...
research
07/31/2019

Few-Shot Meta-Denoising

We study the problem of learning-based denoising where the training set ...
research
11/03/2022

Robust Few-shot Learning Without Using any Adversarial Samples

The high cost of acquiring and annotating samples has made the `few-shot...
research
06/02/2022

Learning code summarization from a small and local dataset

Foundation models (e.g., CodeBERT, GraphCodeBERT, CodeT5) work well for ...

Please sign up or login with your details

Forgot password? Click here to reset