Improving Code Summarization with Block-wise Abstract Syntax Tree Splitting

03/14/2021
by   Chen Lin, et al.
0

Automatic code summarization frees software developers from the heavy burden of manual commenting and benefits software development and maintenance. Abstract Syntax Tree (AST), which depicts the source code's syntactic structure, has been incorporated to guide the generation of code summaries. However, existing AST based methods suffer from the difficulty of training and generate inadequate code summaries. In this paper, we present the Block-wise Abstract Syntax Tree Splitting method (BASTS for short), which fully utilizes the rich tree-form syntax structure in ASTs, for improving code summarization. BASTS splits the code of a method based on the blocks in the dominator tree of the Control Flow Graph, and generates a split AST for each code split. Each split AST is then modeled by a Tree-LSTM using a pre-training strategy to capture local non-linear syntax encoding. The learned syntax encoding is combined with code encoding, and fed into Transformer to generate high-quality code summaries. Comprehensive experiments on benchmarks have demonstrated that BASTS significantly outperforms state-of-the-art approaches in terms of various evaluation metrics. To facilitate reproducibility, our implementation is available at https://github.com/XMUDM/BASTS.

READ FULL TEXT
research
08/30/2021

CAST: Enhancing Code Summarization with Hierarchical Splitting and Reconstruction of Abstract Syntax Trees

Code summarization aims to generate concise natural language description...
research
12/02/2021

AST-Transformer: Encoding Abstract Syntax Trees Efficiently for Code Summarization

Code summarization aims to generate brief natural language descriptions ...
research
01/19/2022

GAP-Gen: Guided Automatic Python Code Generation

Automatic code generation from natural language descriptions can be high...
research
08/10/2023

AST-MHSA : Code Summarization using Multi-Head Self-Attention

Code summarization aims to generate concise natural language description...
research
08/12/2022

Towards Code Summarization of APIs Using NLP Techniques

Each programming language comes with official documentation to guide dev...
research
05/26/2021

TreeBERT: A Tree-Based Pre-Trained Model for Programming Language

Source code can be parsed into the abstract syntax tree (AST) based on d...
research
04/14/2020

Code Completion using Neural Attention and Byte Pair Encoding

In this paper, we aim to do code completion based on implementing a Neur...

Please sign up or login with your details

Forgot password? Click here to reset