FixEval: Execution-based Evaluation of Program Fixes for Competitive Programming Problems

06/15/2022
by   Md. Mahim Anjum Haque, et al.
0

Source code repositories consist of large codebases, often containing error-prone programs. The increasing complexity of software has led to a drastic rise in time and costs for identifying and fixing these defects. Various methods exist to automatically generate fixes for buggy code. However, due to the large combinatorial space of possible solutions for a particular bug, there are not many tools and datasets available to evaluate generated code effectively. In this work, we introduce FixEval, a benchmark comprising buggy code submissions to competitive programming problems and their respective fixes. We introduce a rich test suite to evaluate and assess the correctness of model-generated program fixes. We consider two Transformer language models pretrained on programming languages as our baselines, and compare them using match-based and execution-based evaluation metrics. Our experiments show that match-based metrics do not reflect model-generated program fixes accurately, while execution-based methods evaluate programs through all cases and scenarios specifically designed for that solution. Therefore, we believe FixEval provides a step towards real-world automatic bug fixing and model-generated code evaluation.

READ FULL TEXT

page 16

page 17

research
02/16/2023

LEVER: Learning to Verify Language-to-Code Generation with Execution

The advent of large language models trained on code (code LLMs) has led ...
research
12/09/2021

Towards Neural Functional Program Evaluation

This paper explores the capabilities of current transformer-based langua...
research
01/09/2019

Automated Customized Bug-Benchmark Generation

We introduce Bug-Injector, a system that automatically creates benchmark...
research
07/17/2023

Directed Test Program Generation for JIT Compiler Bug Localization

Bug localization techniques for Just-in-Time (JIT) compilers are based o...
research
04/23/2020

BOLD: An Ontology-based Log Debugger for C Programs

The different activities related to debugging such as program instrument...
research
08/16/2021

Autoencoders as Tools for Program Synthesis

Recently there have been many advances in research on language modeling ...
research
11/29/2022

Coder Reviewer Reranking for Code Generation

Sampling diverse programs from a code language model and reranking with ...

Please sign up or login with your details

Forgot password? Click here to reset