RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic Program Repair

by   Weishi Wang, et al.

Automatic program repair (APR) is crucial to reduce manual debugging efforts for developers and improve software reliability. While conventional search-based techniques typically rely on heuristic rules or a redundancy assumption to mine fix patterns, recent years have witnessed the surge of deep learning (DL) based approaches to automate the program repair process in a data-driven manner. However, their performance is often limited by a fixed set of parameters to model the highly complex search space of APR. To ease such burden on the parametric models, in this work, we propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen) by explicitly leveraging relevant fix patterns retrieved from a codebase of previous bug-fix pairs. Specifically, we build a hybrid patch retriever to account for both lexical and semantic matching based on the raw source code in a language-agnostic manner, which does not rely on any code-specific features. In addition, we adapt a code-aware language model CodeT5 as our foundation model to facilitate both patch retrieval and generation tasks in a unified manner. We adopt a stage-wise approach where the patch retriever first retrieves a relevant external bug-fix pair to augment the buggy input for the CodeT5 patch generator, which synthesizes a ranked list of repair patch candidates. Notably, RAP-Gen is a generic APR framework that can flexibly integrate different patch retrievers and generators to repair various types of bugs. We thoroughly evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java, where the bug localization information may or may not be provided. Experimental results show that RAP-Gen significantly outperforms previous state-of-the-art approaches on all benchmarks, e.g., repairing 15 more bugs on 818 Defects4J bugs.


Detect-Localize-Repair: A Unified Framework for Learning to Debug with CodeT5

Automated software debugging is a crucial task for improving the product...

Attention Please: Consider Mockito when Evaluating Newly Proposed Automated Program Repair Techniques

Automated program repair (APR) has attracted widespread attention in rec...

Towards More Reliable Automated Program Repair by Integrating Static Analysis Techniques

A long-standing open challenge for automated program repair is the overf...

Explainable Automated Debugging via Large Language Model-driven Scientific Debugging

Automated debugging techniques have the potential to reduce developer ef...

Adversarial Patch Generation for Automatic Program Repair

Automatic program repair (APR) has seen a growing interest in recent yea...

ENCORE: Ensemble Learning using Convolution Neural Machine Translation for Automatic Program Repair

Automated generate-and-validate (G&V) program repair techniques typicall...

Revisiting the Plastic Surgery Hypothesis via Large Language Models

Automated Program Repair (APR) aspires to automatically generate patches...

Please sign up or login with your details

Forgot password? Click here to reset