DeepAI
Log In Sign Up

Exploring the Naturalness of Buggy Code with Recurrent Neural Networks

03/21/2018
by   Jack Lanchantin, et al.
0

Statistical language models are powerful tools which have been used for many tasks within natural language processing. Recently, they have been used for other sequential data such as source code.(Ray et al., 2015) showed that it is possible train an n-gram source code language mode, and use it to predict buggy lines in code by determining "unnatural" lines via entropy with respect to the language model. In this work, we propose using a more advanced language modeling technique, Long Short-term Memory recurrent neural networks, to model source code and classify buggy lines based on entropy. We show that our method slightly outperforms an n-gram model in the buggy line classification task using AUC.

READ FULL TEXT
12/24/2014

Learning Longer Memory in Recurrent Neural Networks

Recurrent neural network is a powerful model that learns temporal patter...
03/09/2018

The Importance of Being Recurrent for Modeling Hierarchical Structure

Recent work has shown that recurrent neural networks (RNNs) can implicit...
06/22/2019

Evaluating Computational Language Models with Scaling Properties of Natural Language

In this article, we evaluate computational models of natural language wi...
08/26/2019

Using LSTMs to Model the Java Programming Language

Recurrent neural networks (RNNs), specifically long-short term memory ne...
05/26/2018

Splitting source code identifiers using Bidirectional LSTM Recurrent Neural Network

Programmers make rich use of natural language in the source code they wr...
12/03/2014

Skip-gram Language Modeling Using Sparse Non-negative Matrix Probability Estimation

We present a novel family of language model (LM) estimation techniques n...