Predicting Code Coverage without Execution

07/25/2023
by   Michele Tufano, et al.
0

Code coverage is a widely used metric for quantifying the extent to which program elements, such as statements or branches, are executed during testing. Calculating code coverage is resource-intensive, requiring code building and execution with additional overhead for the instrumentation. Furthermore, computing coverage of any snippet of code requires the whole program context. Using Machine Learning to amortize this expensive process could lower the cost of code coverage by requiring only the source code context, and the task of code coverage prediction can be a novel benchmark for judging the ability of models to understand code. We propose a novel benchmark task called Code Coverage Prediction for Large Language Models (LLMs). We formalize this task to evaluate the capability of LLMs in understanding code execution by determining which lines of a method are executed by a given test case and inputs. We curate and release a dataset we call COVERAGEEVAL by executing tests and code from the HumanEval dataset and collecting code coverage information. We report the performance of four state-of-the-art LLMs used for code-related tasks, including OpenAI's GPT-4 and GPT-3.5-Turbo, Google's BARD, and Anthropic's Claude, on the Code Coverage Prediction task. Finally, we argue that code coverage as a metric and pre-training data source are valuable for overall LLM performance on software engineering tasks.

READ FULL TEXT

page 11

page 12

research
06/13/2023

TRACED: Execution-aware Pre-training for Source Code

Most existing pre-trained language models for source code focus on learn...
research
01/13/2022

FuzzingDriver: the Missing Dictionary to Increase Code Coverage in Fuzzers

We propose a tool, called FuzzingDriver, to generate dictionary tokens f...
research
05/04/2023

SlipCover: Near Zero-Overhead Code Coverage for Python

Coverage analysis is widely used but can suffer from high overhead. This...
research
08/18/2020

Differential coverage: automating coverage analysis

While it is easy to automate coverage data collection, it is a time cons...
research
04/20/2018

Ticket Coverage: Putting Test Coverage into Context

There is no metric that determines how well the implementation of a tick...
research
02/04/2021

Refined Grey-Box Fuzzing with SIVO

We design and implement from scratch a new fuzzer called SIVO that refin...
research
02/26/2022

BeDivFuzz: Integrating Behavioral Diversity into Generator-based Fuzzing

A popular metric to evaluate the performance of fuzzers is branch covera...

Please sign up or login with your details

Forgot password? Click here to reset