A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code

10/23/2020
by   Nadezhda Chirkova, et al.
0

There is an emerging interest in the application of deep learning models to source code processing tasks. One of the major problems in applying deep learning to software engineering is that source code often contains a lot of rare identifiers resulting in huge vocabularies. We propose a simple yet effective method based on identifier anonymization to handle out-of-vocabulary (OOV) identifiers. Our method can be treated as a preprocessing step and therefore allows an easy implementation. We show that the proposed OOV anonymization method significantly improves the performance of the Transformer in two code processing tasks: code completion and bug fixing.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/30/2022

Adding Context to Source Code Representations for Deep Learning

Deep learning models have been successfully applied to a variety of soft...
research
10/11/2021

Bug Prediction Using Source Code Embedding Based on Doc2Vec

Bug prediction is a resource demanding task that is hard to automate usi...
research
10/18/2018

Open Vocabulary Learning on Source Code with a Graph-Structured Cache

Machine learning models that take computer program source code as input ...
research
04/03/2019

Modeling Vocabulary for Big Code Machine Learning

When building machine learning models that operate on source code, sever...
research
08/25/2023

Investigating the Impact of Vocabulary Difficulty and Code Naturalness on Program Comprehension

Context: Developers spend most of their time comprehending source code d...
research
04/05/2019

On the Feasibility of Transfer-learning Code Smells using Deep Learning

Context: A substantial amount of work has been done to detect smells in ...
research
02/16/2019

PatchNet: A Tool for Deep Patch Classification

This work proposes PatchNet, an automated tool based on hierarchical dee...

Please sign up or login with your details

Forgot password? Click here to reset