Automatic Program Repair with OpenAI's Codex: Evaluating QuixBugs

11/06/2021
by   Julian Aron Prenner, et al.
0

OpenAI's Codex, a GPT-3 like model trained on a large code corpus, has made headlines in and outside of academia. Given a short user-provided description, it is capable of synthesizing code snippets that are syntactically and semantically valid in most cases. In this work, we want to investigate whether Codex is able to localize and fix bugs, a task of central interest in the field of automated program repair. Our initial evaluation uses the multi-language QuixBugs benchmark (40 bugs in both Python and Java). We find that, despite not being trained for APR, Codex is surprisingly effective, and competitive with recent state of the art techniques. Our results also show that Codex is slightly more successful at repairing Python than Java.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/03/2019

Neural Program Repair by Jointly Learning to Localize and Repair

Due to its potential to improve programmer productivity and software qua...
research
03/22/2021

Applying CodeBERT for Automated Program Repair of Java Simple Bugs

Software debugging, and program repair are among the most time-consuming...
research
12/13/2018

Attention Please: Consider Mockito when Evaluating Newly Proposed Automated Program Repair Techniques

Automated program repair (APR) has attracted widespread attention in rec...
research
12/21/2021

Elixir: Effective object-oriented program repair

This work is motivated by the pervasive use of method invocations in obj...
research
06/20/2019

ENCORE: Ensemble Learning using Convolution Neural Machine Translation for Automatic Program Repair

Automated generate-and-validate (G&V) program repair techniques typicall...
research
05/31/2022

A Replication Study on Predicting Metamorphic Relations at Unit Testing Level

Metamorphic Testing (MT) addresses the test oracle problem by examining ...
research
12/21/2017

ARJA: Automated Repair of Java Programs via Multi-Objective Genetic Programming

Recent empirical studies show that the performance of GenProg is not sat...

Please sign up or login with your details

Forgot password? Click here to reset