Modeling Vocabulary for Big Code Machine Learning

04/03/2019
by   Hlib Babii, et al.
0

When building machine learning models that operate on source code, several decisions have to be made to model source-code vocabulary. These decisions can have a large impact: some can lead to not being able to train models at all, others significantly affect performance, particularly for Neural Language Models. Yet, these decisions are not often fully described. This paper lists important modeling choices for source code vocabulary, and explores their impact on the resulting vocabulary on a large-scale corpus of 14,436 projects. We show that a subset of decisions have decisive characteristics, allowing to train accurate Neural Language Models quickly on a large corpus of 10,106 projects.

READ FULL TEXT
research
03/17/2020

Big Code != Big Vocabulary: Open-Vocabulary Models for Source Code

Statistical language modeling techniques have successfully been applied ...
research
12/16/2018

The Adverse Effects of Code Duplication in Machine Learning Models of Code

The field of big code relies on mining large corpora of code to perform ...
research
01/29/2023

Composer's Assistant: Interactive Transformers for Multi-Track MIDI Infilling

We consider the task of multi-track MIDI infilling when arbitrary (track...
research
06/01/2023

Analysis of ChatGPT on Source Code

This paper explores the use of Large Language Models (LLMs) and in parti...
research
03/13/2019

Maybe Deep Neural Networks are the Best Choice for Modeling Source Code

Statistical language modeling techniques have successfully been applied ...
research
10/23/2020

A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code

There is an emerging interest in the application of deep learning models...
research
05/28/2020

Using Source Code Density to Improve the Accuracy of Automatic Commit Classification into Maintenance Activities

Source code is changed for a reason, e.g., to adapt, correct, or adapt i...

Please sign up or login with your details

Forgot password? Click here to reset