Long-Range Modeling of Source Code Files with eWASH: Extended Window Access by Syntax Hierarchy

09/17/2021
by   Colin B. Clement, et al.
0

Statistical language modeling and translation with transformers have found many successful applications in program understanding and generation tasks, setting high benchmarks for tools in modern software development environments. The finite context window of these neural models means, however, that they will be unable to leverage the entire relevant context of large files and packages for any given task. While there are many efforts to extend the context window, we introduce an architecture-independent approach for leveraging the syntactic hierarchies of source code for incorporating entire file-level context into a fixed-length window. Using concrete syntax trees of each source file we extract syntactic hierarchies and integrate them into context window by selectively removing from view more specific, less relevant scopes for a given task. We evaluate this approach on code generation tasks and joint translation of natural language and source code in Python programming language, achieving a new state-of-the-art in code completion and summarization for Python in the CodeXGLUE benchmark. We also introduce new CodeXGLUE benchmarks for user-experience-motivated tasks: code completion with normalized literals, method body completion/code summarization conditioned on file-level context.

READ FULL TEXT
research
03/15/2022

ReACC: A Retrieval-Augmented Code Completion Framework

Code completion, which aims to predict the following code token(s) accor...
research
12/20/2022

CoCoMIC: Code Completion By Jointly Modeling In-file and Cross-file Context

While pre-trained language models (LM) for code have achieved great succ...
research
09/05/2023

Revisiting File Context for Source Code Summarization

Source code summarization is the task of writing natural language descri...
research
10/15/2020

Empirical Study of Transformers for Source Code

Initially developed for natural language processing (NLP), Transformers ...
research
06/01/2023

Better Context Makes Better Code Language Models: A Case Study on Function Call Argument Completion

Pretrained code language models have enabled great progress towards prog...
research
03/04/2023

Demystifying What Code Summarization Models Learned

Study patterns that models have learned has long been a focus of pattern...
research
07/15/2019

DeepRace: Finding Data Race Bugs via Deep Learning

With the proliferation of multi-core hardware, parallel programs have be...

Please sign up or login with your details

Forgot password? Click here to reset