Neural Networks for Modeling Source Code Edits

04/04/2019
by   Rui Zhao, et al.
2

Programming languages are emerging as a challenging and interesting domain for machine learning. A core task, which has received significant attention in recent years, is building generative models of source code. However, to our knowledge, previous generative models have always been framed in terms of generating static snapshots of code. In this work, we instead treat source code as a dynamic object and tackle the problem of modeling the edits that software developers make to source code files. This requires extracting intent from previous edits and leveraging it to generate subsequent edits. We develop several neural networks and use synthetic data to test their ability to learn challenging edit patterns that require strong generalization. We then collect and train our models on a large-scale dataset of Google source code, consisting of millions of fine-grained edits from thousands of Python developers. From the modeling perspective, our main conclusion is that a new composition of attentional and pointer network components provides the best overall performance and scalability. From the application perspective, our results provide preliminary evidence of the feasibility of developing tools that learn to predict future edits.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/02/2014

Structured Generative Models of Natural Source Code

We study the problem of building generative models of natural source cod...
research
02/02/2018

Best Practices for a Future Open Code Policy: Experiences and Vision of the Astrophysics Source Code Library

We are members of the Astrophysics Source Code Library's Advisory Commit...
research
04/15/2019

Semantic Source Code Models Using Identifier Embeddings

The emergence of online open source repositories in the recent years has...
research
06/11/2021

Assessing the Effectiveness of Syntactic Structure to Learn Code Edit Representations

In recent times, it has been shown that one can use code as data to aid ...
research
10/20/2019

Processing Large Datasets of Fined Grained Source Code Changes

In the era of Big Code, when researchers seek to study an increasingly l...
research
07/17/2023

A Lightweight Framework for High-Quality Code Generation

In recent years, the use of automated source code generation utilizing t...
research
10/31/2018

Learning to Represent Edits

We introduce the problem of learning distributed representations of edit...

Please sign up or login with your details

Forgot password? Click here to reset