Syntax-Aware On-the-Fly Code Completion

11/09/2022
by   Wannita Takerngsaksiri, et al.
0

Code completion aims to help improve developers' productivity by suggesting the next code tokens from a given context. Various approaches have been proposed to incorporate abstract syntax tree (AST) information for model training, ensuring that code completion is aware of the syntax of the programming languages. However, existing syntax-aware code completion approaches are not on-the-fly, as we found that for every two-thirds of characters that developers type, AST fails to be extracted because it requires the syntactically correct source code, limiting its practicality in real-world scenarios. On the other hand, existing on-the-fly code completion does not consider syntactic information yet. In this paper, we propose PyCoder to leverage token types, a kind of lightweight syntactic information, which is readily available and aligns with the natural order of source code. Our PyCoder is trained in a multi-task training manner so that by learning the supporting task of predicting token types during the training phase, the models achieve better performance on predicting tokens and lines of code without the need for token types in the inference phase. Comprehensive experiments show that PyCoder achieves the first rank on the CodeXGLUE leaderboard with an accuracy of 77.12 for the token-level predictions, which is 0.43 baselines. In addition, PyCoder achieves an exact match of 43.37 line-level predictions, which is 3.63 These results lead us to conclude that token type information (an alternative to syntactic information) that is rarely used in the past can greatly improve the performance of code completion approaches, without requiring the syntactically correct source code like AST-based approaches do. Our PyCoder is publicly available on HuggingFace.

READ FULL TEXT

page 12

page 14

research
04/21/2022

Non-autoregressive Model for Full-line Code Completion

Code completion tools are frequently used by software developers to acce...
research
08/04/2022

On-the-Fly Syntax Highlighting using Neural Networks

With the presence of online collaborative tools for software developers,...
research
02/14/2022

CodeFill: Multi-token Code Completion by Jointly Learning from Structure and Naming Sequences

Code completion is an essential feature of IDEs, yet current autocomplet...
research
05/08/2020

Corrigendum to Improve Language Modelling for Code Completion through Learning General Token Repetition of Source Code

This paper is written because I receive several inquiry emails saying it...
research
09/08/2020

Predicting Defective Lines Using a Model-Agnostic Technique

Defect prediction models are proposed to help a team prioritize source c...
research
08/03/2021

An Empirical Study on the Usage of Transformer Models for Code Completion

Code completion aims at speeding up code writing by predicting the next ...
research
03/17/2021

Code Completion by Modeling Flattened Abstract Syntax Trees as Graphs

Code completion has become an essential component of integrated developm...

Please sign up or login with your details

Forgot password? Click here to reset