Demystifying GPT Self-Repair for Code Generation

06/16/2023
by   Theo X. Olausson, et al.
0

Large Language Models (LLMs) have shown remarkable aptitude in code generation but still struggle on challenging programming tasks. Self-repair – in which the model debugs and fixes mistakes in its own code – has recently become a popular way to boost performance in these settings. However, only very limited studies on how and when self-repair works effectively exist in the literature, and one might wonder to what extent a model is really capable of providing accurate feedback on why the code is wrong when that code was generated by the same model. In this paper, we analyze GPT-3.5 and GPT-4's ability to perform self-repair on APPS, a challenging dataset consisting of diverse coding challenges. To do so, we first establish a new evaluation strategy dubbed pass@t that measures the pass rate of the tasks against the total number of tokens sampled from the model, enabling a fair comparison to purely sampling-based approaches. With this evaluation strategy, we find that the effectiveness of self-repair is only seen in GPT-4. We also observe that self-repair is bottlenecked by the feedback stage; using GPT-4 to give feedback on the programs generated by GPT-3.5 and using expert human programmers to give feedback on the programs generated by GPT-4, we unlock significant performance gains.

READ FULL TEXT

page 6

page 7

page 13

page 14

research
04/11/2023

Teaching Large Language Models to Self-Debug

Large language models (LLMs) have achieved impressive performance on cod...
research
05/20/2020

Graph-based, Self-Supervised Program Repair from Diagnostic Feedback

We consider the problem of learning to repair programs from diagnostic f...
research
04/17/2023

A study on Prompt Design, Advantages and Limitations of ChatGPT for Deep Learning Program Repair

ChatGPT has revolutionized many research and industrial fields. ChatGPT ...
research
04/20/2023

Fully Autonomous Programming with Large Language Models

Current approaches to program synthesis with Large Language Models (LLMs...
research
05/24/2023

ALGO: Synthesizing Algorithmic Programs with Generated Oracle Verifiers

Large language models (LLMs) excel at implementing code from functionali...
research
05/06/2023

Self-Edit: Fault-Aware Code Editor for Code Generation

Large language models (LLMs) have demonstrated an impressive ability to ...
research
07/14/2021

FAPR: Fast and Accurate Program Repair for Introductory Programming Courses

In introductory programming courses, it is challenging for instructors t...

Please sign up or login with your details

Forgot password? Click here to reset