Impact of Code Language Models on Automated Program Repair

02/10/2023
by   Nan Jiang, et al.
0

Automated program repair (APR) aims to help developers improve software reliability by generating patches for buggy programs. Although many code language models (CLM) are developed and effective in many software tasks such as code completion, there has been little comprehensive, in-depth work to evaluate CLMs' fixing capabilities and to fine-tune CLMs for the APR task. Firstly, this work is the first to evaluate ten CLMs on four APR benchmarks, which shows that surprisingly, the best CLM, as is, fixes 72 the state-of-the-art deep-learning (DL)-based APR techniques. Secondly, one of the four APR benchmarks was created by us in this paper to avoid data leaking for a fair evaluation. Thirdly, it is the first work to fine-tune CLMs with APR training data, which shows that fine-tuning brings 31 CLMs and enables them to fix 46 techniques. Fourthly, this work studies the impact of buggy lines, showing that CLMs, as is, cannot make good use of the buggy lines to fix bugs, yet fine-tuned CLMs could potentially over-rely on buggy lines. Lastly, this work analyzes the size, time, and memory efficiency of different CLMs. This work shows promising directions for the APR domain, such as fine-tuning CLMs with APR-specific designs, and also raises awareness of fair and comprehensive evaluations of CLMs and calls for more transparent reporting of open-source repositories used in the pre-training data to address the data leaking problem.

READ FULL TEXT

page 1

page 7

page 9

research
05/29/2023

How Effective Are Neural Networks for Fixing Security Vulnerabilities

Security vulnerability repair is a difficult task that is in dire need o...
research
08/21/2023

Exploring Parameter-Efficient Fine-Tuning Techniques for Code Generation with Large Language Models

Large Language Models (LLMs) possess impressive capabilities to generate...
research
04/16/2023

Automated Program Repair Based on Code Review: How do Pre-trained Transformer Models Perform?

Sequence-to-sequence models have been used to transform erroneous progra...
research
02/03/2023

KNOD: Domain Knowledge Distilled Tree Decoder for Automated Program Repair

Automated Program Repair (APR) improves software reliability by generati...
research
08/17/2023

Towards Automatically Addressing Self-Admitted Technical Debt: How Far Are We?

Upon evolving their software, organizations and individual developers ha...
research
04/14/2023

MedAlpaca – An Open-Source Collection of Medical Conversational AI Models and Training Data

As large language models (LLMs) like OpenAI's GPT series continue to mak...
research
12/20/2022

Is GPT-3 a Psychopath? Evaluating Large Language Models from a Psychological Perspective

Are large language models (LLMs) like GPT-3 psychologically safe? In thi...

Please sign up or login with your details

Forgot password? Click here to reset