Automated Program Repair Based on Code Review: How do Pre-trained Transformer Models Perform?

04/16/2023
by   Rishov Paul, et al.
0

Sequence-to-sequence models have been used to transform erroneous programs into correct ones when trained with a large enough dataset. Some recent studies also demonstrated strong empirical evidence that code review (natural language instruction about suggestive changes in code) can improve the program repair further. Large language models, trained with Natural Language (NL) and computer program corpora, have the capacity to contain inherent knowledge of both. In this study, we investigate if this inherent knowledge of code and NL can be utilized to improve automated program repair. We applied PLBART and CodeT5, two state-of-the-art language models that are pre-trained with both Programming Language (PL) and Natural Language (NL), on two such natural language-based program repair datasets and found that the pre-trained language models fine-tuned with datasets containing both code review and subsequent code changes notably outperform each of the previous models. We observed that the pre-trained models improve the previously best-reported results by 9.91 Review4Repair dataset and by 24.72 suggests that a pre-trained sequential model has a better understanding of natural language and can utilize it much better. We performed an ablation study to assess the contribution of the pre-training mechanism and the model architecture. We found that pre-training was significantly more important in the performance gain than the model architecture. The practical application of using pre-trained transformer models in the context of automated program repair is still a long way off. However, our study demonstrates the substantial value of employing pre-trained models, paving the path for future studies to use more of these.

READ FULL TEXT

page 1

page 7

page 8

research
10/26/2022

Benchmarking Language Models for Code Syntax Understanding

Pre-trained language models have demonstrated impressive performance in ...
research
02/19/2020

CodeBERT: A Pre-Trained Model for Programming and Natural Languages

We present CodeBERT, a bimodal pre-trained model for programming languag...
research
05/19/2023

Prompting with Pseudo-Code Instructions

Prompting with natural language instructions has recently emerged as a p...
research
01/18/2022

Using Pre-Trained Models to Boost Code Review Automation

Code review is a practice widely adopted in open source and industrial p...
research
02/10/2023

Impact of Code Language Models on Automated Program Repair

Automated program repair (APR) aims to help developers improve software ...
research
08/11/2022

CodeBERT-nt: code naturalness via CodeBERT

Much of software-engineering research relies on the naturalness of code,...
research
06/11/2020

Teaching Pre-Trained Models to Systematically Reason Over Implicit Knowledge

To what extent can a neural network systematically reason over symbolic ...

Please sign up or login with your details

Forgot password? Click here to reset