Automatically Extracting Subroutine Summary Descriptions from Unstructured Comments

12/21/2019
by   Zachary Eberhart, et al.
0

Summary descriptions of subroutines are short (usually one-sentence) natural language explanations of a subroutine's behavior and purpose in a program. These summaries are ubiquitous in documentation, and many tools such as JavaDocs and Doxygen generate documentation built around them. And yet, extracting summaries from unstructured source code repositories remains a difficult research problem – it is very difficult to generate clean structured documentation unless the summaries are annotated by programmers. This becomes a problem in large repositories of legacy code, since it is cost prohibitive to retroactively annotate summaries in dozens or hundreds of old programs. Likewise, it is a problem for creators of automatic documentation generation algorithms, since these algorithms usually must learn from large annotated datasets, which do not exist for many programming languages. In this paper, we present a semi-automated approach via crowdsourcing and a fully-automated approach for annotating summaries from unstructured code comments. We present experiments validating the approaches, and provide recommendations and cost estimates for automatically annotating large repositories.

READ FULL TEXT
research
01/07/2021

Action Word Prediction for Neural Source Code Summarization

Source code summarization is the task of creating short, natural languag...
research
06/15/2022

An Extractive-and-Abstractive Framework for Source Code Summarization

(Source) Code summarization aims to automatically generate summaries/com...
research
08/28/2023

Distilled GPT for Source Code Summarization

A code summary is a brief natural language description of source code. S...
research
02/05/2019

A Neural Model for Generating Natural Language Summaries of Program Subroutines

Source code summarization -- creating natural language descriptions of s...
research
08/26/2023

EditSum: A Retrieve-and-Edit Framework for Source Code Summarization

Existing studies show that code summaries help developers understand and...
research
04/10/2020

Improved Automatic Summarization of Subroutines via Attention to File Context

Software documentation largely consists of short, natural language summa...
research
09/22/2020

PodSumm – Podcast Audio Summarization

The diverse nature, scale, and specificity of podcasts present a unique ...

Please sign up or login with your details

Forgot password? Click here to reset