CoCoMIC: Code Completion By Jointly Modeling In-file and Cross-file Context

12/20/2022
by   Yangruibo Ding, et al.
0

While pre-trained language models (LM) for code have achieved great success in code completion, they generate code conditioned only on the contents within the file, i.e., in-file context, but ignore the rich semantics in other files within the same project, i.e., cross-file context, a critical source of information that is especially useful in modern modular software development. Such overlooking constrains code language models' capacity in code completion, leading to unexpected behaviors such as generating hallucinated class member functions or function calls with unexpected arguments. In this work, we develop a cross-file context finder tool, CCFINDER, that effectively locates and retrieves the most relevant cross-file context. We propose CoCoMIC, a framework that incorporates cross-file context to learn the in-file and cross-file context jointly on top of pretrained code LMs. CoCoMIC successfully improves the existing code LM with a 19.30 15.41 cross-file context is provided.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/05/2023

RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems

Large Language Models (LLMs) have greatly advanced code auto-completion ...
research
06/01/2023

Better Context Makes Better Code Language Models: A Case Study on Function Call Argument Completion

Pretrained code language models have enabled great progress towards prog...
research
09/17/2021

Long-Range Modeling of Source Code Files with eWASH: Extended Window Access by Syntax Hierarchy

Statistical language modeling and translation with transformers have fou...
research
09/14/2023

Do Not Give Away My Secrets: Uncovering the Privacy Issue of Neural Code Completion Tools

Neural Code Completion Tools (NCCTs) have reshaped the field of software...
research
05/22/2022

Protecting File Activities via Deception for ARM TrustZone

A TrustZone TEE often invokes an external filesystem. While filedata can...
research
12/24/2021

One-to-One or One-to-many? What function inlining brings to binary2source similarity analysis

Binary2source code matching is critical to many code-reuse-related tasks...
research
08/17/2022

CCTEST: Testing and Repairing Code Completion Systems

Code completion, a highly valuable topic in the software development dom...

Please sign up or login with your details

Forgot password? Click here to reset