CCTEST: Testing and Repairing Code Completion Systems

08/17/2022
by   Zongjie Li, et al.
0

Code completion, a highly valuable topic in the software development domain, has been increasingly promoted for use by recent advances in large language models (LLMs). To date, visible LLM-based code completion frameworks like GitHub Copilot and GPT are trained using deep learning over vast quantities of unstructured text and open source codes. As the paramount component and the cornerstone in daily programming tasks, code completion has largely boosted professionals' efficiency in building real-world software systems. In contrast to this flourishing market, we find that code completion models often output suspicious results, and to date, an automated testing and enhancement framework for code completion models is not available. This research proposes CCTEST, a framework to test and repair code completion systems in blackbox settings. CCTEST features a novel mutation strategy, namely program structure-consistency (PSC) mutations, to generate mutated code completion inputs. Then, it detects inconsistent outputs, representing likely erroneous cases, from all the completed code cases. Moreover, CCTEST repairs the code completion outputs by selecting the output that mostly reflects the "average" appearance of all output cases, as the final output of the code completion systems. We detected a total of 33,540 inputs that can trigger likely erroneous cases from eight popular LLM-based code completion systems. With repairing, we show that the performance of code completion models notably increased by 53.51

READ FULL TEXT

page 1

page 10

page 11

research
11/09/2020

Learning Autocompletion from Real-World Datasets

Code completion is a popular software development tool integrated into a...
research
11/27/2017

Code Completion with Neural Attention and Pointer Networks

Intelligent code completion has become an essential tool to accelerate m...
research
02/20/2023

Learning Deep Semantics for Test Completion

Writing tests is a time-consuming yet essential task during software dev...
research
03/07/2023

From Copilot to Pilot: Towards AI Supported Software Development

AI-supported programming has arrived, as shown by the introduction and s...
research
06/29/2020

Simplifying Models with Unlabeled Output Data

We focus on prediction problems with high-dimensional outputs that are s...
research
12/20/2022

CoCoMIC: Code Completion By Jointly Modeling In-file and Cross-file Context

While pre-trained language models (LM) for code have achieved great succ...
research
06/26/2021

Toward Less Hidden Cost of Code Completion with Acceptance and Ranking Models

Code completion is widely used by software developers to provide coding ...

Please sign up or login with your details

Forgot password? Click here to reset