DeepAI AI Chat
Log In Sign Up

CAST: Enhancing Code Summarization with Hierarchical Splitting and Reconstruction of Abstract Syntax Trees

08/30/2021
by   Ensheng Shi, et al.
Xi'an Jiaotong University
0

Code summarization aims to generate concise natural language descriptions of source code, which can help improve program comprehension and maintenance. Recent studies show that syntactic and structural information extracted from abstract syntax trees (ASTs) is conducive to summary generation. However, existing approaches fail to fully capture the rich information in ASTs because of the large size/depth of ASTs. In this paper, we propose a novel model CAST that hierarchically splits and reconstructs ASTs. First, we hierarchically split a large AST into a set of subtrees and utilize a recursive neural network to encode the subtrees. Then, we aggregate the embeddings of subtrees by reconstructing the split ASTs to get the representation of the complete AST. Finally, AST representation, together with source code embedding obtained by a vanilla code token encoder, is used for code summarization. Extensive experiments, including the ablation study and the human evaluation, on benchmarks have demonstrated the power of CAST. To facilitate reproducibility, our code and data are available at https://anonymous.4open.science/r/CAST/.

READ FULL TEXT

page 1

page 2

page 3

page 4

03/14/2021

Improving Code Summarization with Block-wise Abstract Syntax Tree Splitting

Automatic code summarization frees software developers from the heavy bu...
12/02/2021

AST-Transformer: Encoding Abstract Syntax Trees Efficiently for Code Summarization

Code summarization aims to generate brief natural language descriptions ...
08/06/2017

CodeSum: Translate Program Language to Natural Language

During software maintenance, programmers spend a lot of time on code com...
02/24/2020

Leveraging Code Generation to Improve Code Retrieval and Summarization via Dual Learning

Code summarization generates brief natural language description given a ...
02/14/2022

CodeGen-Test: An Automatic Code Generation Model Integrating Program Test Information

Automatic code generation is to generate the program code according to t...
11/14/2021

Code Representation Learning with Prüfer Sequences

An effective and efficient encoding of the source code of a computer pro...
07/05/2021

CoCoSum: Contextual Code Summarization with Multi-Relational Graph Neural Network

Source code summaries are short natural language descriptions of code sn...