CCTEST: Testing and Repairing Code Completion Systems

by   Zongjie Li, et al.

Code completion, a highly valuable topic in the software development domain, has been increasingly promoted for use by recent advances in large language models (LLMs). To date, visible LLM-based code completion frameworks like GitHub Copilot and GPT are trained using deep learning over vast quantities of unstructured text and open source codes. As the paramount component and the cornerstone in daily programming tasks, code completion has largely boosted professionals' efficiency in building real-world software systems. In contrast to this flourishing market, we find that code completion models often output suspicious results, and to date, an automated testing and enhancement framework for code completion models is not available. This research proposes CCTEST, a framework to test and repair code completion systems in blackbox settings. CCTEST features a novel mutation strategy, namely program structure-consistency (PSC) mutations, to generate mutated code completion inputs. Then, it detects inconsistent outputs, representing likely erroneous cases, from all the completed code cases. Moreover, CCTEST repairs the code completion outputs by selecting the output that mostly reflects the "average" appearance of all output cases, as the final output of the code completion systems. We detected a total of 33,540 inputs that can trigger likely erroneous cases from eight popular LLM-based code completion systems. With repairing, we show that the performance of code completion models notably increased by 53.51


page 1

page 10

page 11


Learning Autocompletion from Real-World Datasets

Code completion is a popular software development tool integrated into a...

Code Completion with Neural Attention and Pointer Networks

Intelligent code completion has become an essential tool to accelerate m...

Learning Deep Semantics for Test Completion

Writing tests is a time-consuming yet essential task during software dev...

From Copilot to Pilot: Towards AI Supported Software Development

AI-supported programming has arrived, as shown by the introduction and s...

Simplifying Models with Unlabeled Output Data

We focus on prediction problems with high-dimensional outputs that are s...

CoCoMIC: Code Completion By Jointly Modeling In-file and Cross-file Context

While pre-trained language models (LM) for code have achieved great succ...

Toward Less Hidden Cost of Code Completion with Acceptance and Ranking Models

Code completion is widely used by software developers to provide coding ...

Please sign up or login with your details

Forgot password? Click here to reset