Automatic Source Code Summarization with Extended Tree-LSTM

06/19/2019
by   Yusuke Shido, et al.
0

Neural machine translation models are used to automatically generate a document from given source code since this can be regarded as a machine translation task. Source code summarization is one of the components for automatic document generation, which generates a summary in natural language from given source code. This suggests that techniques used in neural machine translation, such as Long Short-Term Memory (LSTM), can be used for source code summarization. However, there is a considerable difference between source code and natural language: Source code is essentially structured, having loops and conditional branching, etc. Therefore, there is some obstacle to apply known machine translation models to source code. Abstract syntax trees (ASTs) capture these structural properties and play an important role in recent machine learning studies on source code. Tree-LSTM is proposed as a generalization of LSTMs for tree-structured data. However, there is a critical issue when applying it to ASTs: It cannot handle a tree that contains nodes having an arbitrary number of children and their order simultaneously, which ASTs generally have such nodes. To address this issue, we propose an extension of Tree-LSTM, which we call Multi-way Tree-LSTM and apply it for source code summarization. As a result of computational experiments, our proposal achieved better results when compared with several state-of-the-art techniques.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/02/2021

AST-Transformer: Encoding Abstract Syntax Trees Efficiently for Code Summarization

Code summarization aims to generate brief natural language descriptions ...
research
05/22/2023

Neural Machine Translation for Code Generation

Neural machine translation (NMT) methods developed for natural language ...
research
08/01/2019

Tree-Transformer: A Transformer-Based Method for Correction of Tree-Structured Data

Many common sequential data sources, such as source code and natural lan...
research
02/03/2018

A deep tree-based model for software defect prediction

Defects are common in software systems and can potentially cause various...
research
06/11/2021

Assessing the Effectiveness of Syntactic Structure to Learn Code Edit Representations

In recent times, it has been shown that one can use code as data to aid ...
research
04/19/2019

Learning Programmatic Idioms for Scalable Semantic Parsing

Programmers typically organize executable source code using high-level c...
research
06/15/2021

Code to Comment Translation: A Comparative Study on Model Effectiveness Errors

Automated source code summarization is a popular software engineering re...

Please sign up or login with your details

Forgot password? Click here to reset