Neural Code Completion with Anonymized Variable Names
Source code processing heavily relies on the methods widely used in natural language processing (NLP), but involves specifics that need to be taken into account to achieve higher quality. An example of this specificity is that renaming variables does not change the semantics of what the code does. In this work, we develop a recurrent architecture that processes code with all variable names anonymized, i. e. replaced with unique placeholders. The proposed architecture outperforms standard NLP baselines on code completion task by a large margin in the anonymized setting, and improves the base model in the non-anonymized setting, being ensembled with it.
READ FULL TEXT