Using LSTMs to Model the Java Programming Language

08/26/2019
by   Brendon Boldt, et al.
0

Recurrent neural networks (RNNs), specifically long-short term memory networks (LSTMs), can model natural language effectively. This research investigates the ability for these same LSTMs to perform next "word" prediction on the Java programming language. Java source code from four different repositories undergoes a transformation that preserves the logical structure of the source code and removes the code's various specificities such as variable names and literal values. Such datasets and an additional English language corpus are used to train and test standard LSTMs' ability to predict the next element in a sequence. Results suggest that LSTMs can effectively model Java code achieving perplexities under 22 and accuracies above 0.47, which is an improvement over LSTM's performance on the English language which demonstrated a perplexity of 85 and an accuracy of 0.27. This research can have applicability in other areas such as syntactic template suggestion and automated bug patching.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/17/2018

EBG: A Lazy Functional Programming Language Implemented on the Java Virtual Machine

This technical report describes the implementation of a lazy functional ...
research
10/25/2019

Machine Translation from Natural Language to Code using Long-Short Term Memory

Making computer programming language more understandable and easy for th...
research
03/21/2018

Exploring the Naturalness of Buggy Code with Recurrent Neural Networks

Statistical language models are powerful tools which have been used for ...
research
03/19/2023

Towards a Dataset of Programming Contest Plagiarism in Java

In this paper, we describe and present the first dataset of source code ...
research
09/15/2017

Learning Intrinsic Sparse Structures within Long Short-Term Memory

Model compression is significant for the wide adoption of Recurrent Neur...
research
03/31/2019

Exploring the Generality of a Java-based Loop Action Model for the Quorum Programming Language

Many algorithmic steps require more than one statement to implement, but...
research
11/06/2018

Evaluating the Ability of LSTMs to Learn Context-Free Grammars

While long short-term memory (LSTM) neural net architectures are designe...

Please sign up or login with your details

Forgot password? Click here to reset