CCRep: Learning Code Change Representations via Pre-Trained Code Model and Query Back

02/08/2023
by   Zhongxin Liu, et al.
0

Representing code changes as numeric feature vectors, i.e., code change representations, is usually an essential step to automate many software engineering tasks related to code changes, e.g., commit message generation and just-in-time defect prediction. Intuitively, the quality of code change representations is crucial for the effectiveness of automated approaches. Prior work on code changes usually designs and evaluates code change representation approaches for a specific task, and little work has investigated code change encoders that can be used and jointly trained on various tasks. To fill this gap, this work proposes a novel Code Change Representation learning approach named CCRep, which can learn to encode code changes as feature vectors for diverse downstream tasks. Specifically, CCRep regards a code change as the combination of its before-change and after-change code, leverages a pre-trained code model to obtain high-quality contextual embeddings of code, and uses a novel mechanism named query back to extract and encode the changed code fragments and make them explicitly interact with the whole code change. To evaluate CCRep and demonstrate its applicability to diverse code-change-related tasks, we apply it to three tasks: commit message generation, patch correctness assessment, and just-in-time defect prediction. Experimental results show that CCRep outperforms the state-of-the-art techniques on each task.

READ FULL TEXT
research
06/03/2021

Unsupervised Learning of General-Purpose Embeddings for Code Changes

Applying machine learning to tasks that operate with code changes requir...
research
03/12/2020

CC2Vec: Distributed Representations of Code Changes

Existing work on software patches often use features specific to a singl...
research
08/31/2023

Learning to Represent Patches

Patch representation is crucial in automating various software engineeri...
research
06/26/2023

Context-Encoded Code Change Representation for Automated Commit Message Generation

Changes in source code are an inevitable part of software development. T...
research
07/14/2020

Contextualized Code Representation Learning for Commit Message Generation

Automatic generation of high-quality commit messages for code commits ca...
research
07/19/2022

Enhancing Security Patch Identification by Capturing Structures in Commits

With the rapid increasing number of open source software (OSS), the majo...
research
03/13/2023

Generation-based Code Review Automation: How Far Are We?

Code review is an effective software quality assurance activity; however...

Please sign up or login with your details

Forgot password? Click here to reset