Break-It-Fix-It: Unsupervised Learning for Program Repair

06/11/2021
by   Michihiro Yasunaga, et al.
10

We consider repair tasks: given a critic (e.g., compiler) that assesses the quality of an input, the goal is to train a fixer that converts a bad example (e.g., code with syntax errors) into a good one (e.g., code with no syntax errors). Existing works create training data consisting of (bad, good) pairs by corrupting good examples using heuristics (e.g., dropping tokens). However, fixers trained on this synthetically-generated data do not extrapolate well to the real distribution of bad inputs. To bridge this gap, we propose a new training approach, Break-It-Fix-It (BIFI), which has two key ideas: (i) we use the critic to check a fixer's output on real bad inputs and add good (fixed) outputs to the training data, and (ii) we train a breaker to generate realistic bad code from good code. Based on these ideas, we iteratively update the breaker and the fixer while using them in conjunction to generate more paired data. We evaluate BIFI on two code repair datasets: GitHub-Python, a new dataset we introduce where the goal is to repair Python code with AST parse errors; and DeepFix, where the goal is to repair C code with compiler errors. BIFI outperforms existing methods, obtaining 90.5 GitHub-Python (+28.5 require any labeled data; we hope it will be a strong starting point for unsupervised learning of various repair tasks.

READ FULL TEXT
research
09/14/2021

LM-Critic: Language Models for Unsupervised Grammatical Error Correction

Training a model for grammatical error correction (GEC) requires a set o...
research
10/14/2019

Learning Lenient Parsing Typing via Indirect Supervision

Both professional coders and teachers frequently deal with imperfect (fr...
research
11/28/2008

The Good, the Bad, and the Ugly: three different approaches to break their watermarking system

The Good is Blondie, a wandering gunman with a strong personal sense of ...
research
04/03/2019

Styler: Learning Formatting Conventions to Repair Checkstyle Errors

Formatting coding conventions play an important role on code readability...
research
06/01/2011

Identifying Mislabeled Training Data

This paper presents a new approach to identifying and eliminating mislab...
research
05/20/2020

Graph-based, Self-Supervised Program Repair from Diagnostic Feedback

We consider the problem of learning to repair programs from diagnostic f...
research
02/21/2022

Path of Destruction: Learning an Iterative Level Generator Using a Small Dataset

We propose a new procedural content generation method which learns itera...

Please sign up or login with your details

Forgot password? Click here to reset