Assessing the Effectiveness of Syntactic Structure to Learn Code Edit Representations

06/11/2021
by   Syed Arbaaz Qureshi, et al.
0

In recent times, it has been shown that one can use code as data to aid various applications such as automatic commit message generation, automatic generation of pull request descriptions and automatic program repair. Take for instance the problem of commit message generation. Treating source code as a sequence of tokens, state of the art techniques generate commit messages using neural machine translation models. However, they tend to ignore the syntactic structure of programming languages. Previous work, i.e., code2seq has used structural information from Abstract Syntax Tree (AST) to represent source code and they use it to automatically generate method names. In this paper, we elaborate upon this state of the art approach and modify it to represent source code edits. We determine the effect of using such syntactic structure for the problem of classifying code edits. Inspired by the code2seq approach, we evaluate how using structural information from AST, i.e., paths between AST leaf nodes can help with the task of code edit classification on two datasets of fine-grained syntactic edits. Our experiments shows that attempts of adding syntactic structure does not result in any improvements over less sophisticated methods. The results suggest that techniques such as code2seq, while promising, have a long way to go before they can be generically applied to learning code edit representations. We hope that these results will benefit other researchers and inspire them to work further on this problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/04/2018

code2seq: Generating Sequences from Structured Representations of Code

The ability to generate natural language sequences from source code snip...
research
10/31/2018

Learning to Represent Edits

We introduce the problem of learning distributed representations of edit...
research
05/27/2020

A Structural Model for Contextual Code Changes

We address the problem of predicting edit completions based on a learned...
research
06/19/2019

Automatic Source Code Summarization with Extended Tree-LSTM

Neural machine translation models are used to automatically generate a d...
research
04/04/2019

Neural Networks for Modeling Source Code Edits

Programming languages are emerging as a challenging and interesting doma...
research
03/26/2018

A General Path-Based Representation for Predicting Program Properties

Predicting program properties such as names or expression types has a wi...
research
01/28/2021

Learning Structural Edits via Incremental Tree Transformations

While most neural generative models generate outputs in a single pass, t...

Please sign up or login with your details

Forgot password? Click here to reset