Boosting Automated Patch Correctness Prediction via Pre-trained Language Model

01/29/2023
by   Quanjun Zhang, et al.
0

Automated program repair (APR) aims to fix software bugs automatically without human debugging efforts and plays a crucial role in software development and maintenance. Despite the recent significant progress, APR is still challenged by a long-standing overfitting problem (i.e., the generated patch is plausible but overfitting). Various techniques have thus been proposed to address the overfitting problem. Among them, leveraging deep learning approaches to predict patch correctness is emerging along with the available large-scale patch benchmarks recently. However, existing learning-based techniques mainly rely on manually-designed code features, which can be extremely costly and challenging to construct in practice. In this paper, we propose APPT, a pre-trained model-based automated patch correctness assessment technique, which treats the source code as token sequences without extra overhead to design hand-crafted features. In particular, APPT adopts a pre-trained model as the encoder stack, followed by an LSTM stack and a deep learning classifier. Although our idea is general and can be built on various pre-trained models, we implemente APPT based on the BERT model. We conduct an extensive experiment on 1,183 Defects4J patches and the results show that APPT achieves prediction accuracy of 79.0 state-of-the-art technique CACHE by 3.6 on 49,694 real-world patches shows that APPT achieves the optimum performance (exceeding 99 techniques) compared with existing representation learning techniques. We also prove that adopting code pre-trained models can further provide substantial advancement (e.g., GraphCodeBERT-based APPT improves BERT-based APPT by 3.0 and 2.6 generalizability of APPT.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/01/2023

PatchZero: Zero-Shot Automatic Patch Correctness Assessment

Automated Program Repair (APR) techniques have shown more and more promi...
research
08/24/2023

Pre-trained Model-based Automated Software Vulnerability Repair: How Far are We?

Various approaches are proposed to help under-resourced security researc...
research
08/07/2020

Evaluating Representation Learning of Code Changes for Predicting Patch Correctness in Program Repair

A large body of the literature of automated program repair develops appr...
research
01/03/2023

Invalidator: Automated Patch Correctness Assessment via Semantic and Syntactic Reasoning

In this paper, we propose a novel technique, namely INVALIDATOR, to auto...
research
08/21/2023

EALink: An Efficient and Accurate Pre-trained Framework for Issue-Commit Link Recovery

Issue-commit links, as a type of software traceability links, play a vit...
research
10/26/2019

Automated Classification of Overfitting Patches with Statically Extracted Code Features

Automatic program repair (APR) aims to reduce the cost of manually fixin...
research
08/25/2022

Deep Learning-based approaches for automatic detection of shell nouns and evaluation on WikiText-2

In some areas, such as Cognitive Linguistics, researchers are still usin...

Please sign up or login with your details

Forgot password? Click here to reset