A Neural Model for Generating Natural Language Summaries of Program Subroutines

02/05/2019
by   Alexander LeClair, et al.
0

Source code summarization -- creating natural language descriptions of source code behavior -- is a rapidly-growing research topic with applications to automatic documentation generation, program comprehension, and software maintenance. Traditional techniques relied on heuristics and templates built manually by human experts. Recently, data-driven approaches based on neural machine translation have largely overtaken template-based systems. But nearly all of these techniques rely almost entirely on programs having good internal documentation; without clear identifier names, the models fail to create good summaries. In this paper, we present a neural model that combines words from code with code structure from an AST. Unlike previous approaches, our model processes each data source as a separate input, which allows the model to learn code structure independent of the text in code. This process helps our approach provide coherent summaries in many cases even when zero internal documentation is provided. We evaluate our technique with a dataset we created from 2.1m Java methods. We find improvement over two baseline techniques from SE literature and one from NLP literature.

READ FULL TEXT
research
04/06/2020

Improved Code Summarization via a Graph Neural Network

Automatic source code summarization is the task of generating natural la...
research
03/31/2020

DeepSumm – Deep Code Summaries using Neural Transformer Architecture

Source code summarizing is a task of writing short, natural language des...
research
07/04/2021

A Topic Guided Pointer-Generator Model for Generating Natural Language Code Summaries

Code summarization is the task of generating natural language descriptio...
research
04/10/2020

Improved Automatic Summarization of Subroutines via Attention to File Context

Software documentation largely consists of short, natural language summa...
research
12/21/2019

Automatically Extracting Subroutine Summary Descriptions from Unstructured Comments

Summary descriptions of subroutines are short (usually one-sentence) nat...
research
11/10/2021

Data-Driven AI Model Signal-Awareness Enhancement and Introspection

AI modeling for source code understanding tasks has been making signific...
research
12/11/2018

Generating Summaries for Methods of Event-Driven Programs: an Android Case Study

Developers often dedicate a great amount of time to program comprehensio...

Please sign up or login with your details

Forgot password? Click here to reset