EditSum: A Retrieve-and-Edit Framework for Source Code Summarization

08/26/2023
by   Jia Li, et al.
0

Existing studies show that code summaries help developers understand and maintain source code. Unfortunately, these summaries are often missing or outdated in software projects. Code summarization aims to generate natural language descriptions automatically for source code. Code summaries are highly structured and have repetitive patterns. Besides the patternized words, a code summary also contains important keywords, which are the key to reflecting the functionality of the code. However, the state-of-the-art approaches perform poorly on predicting the keywords, which leads to the generated summaries suffering a loss in informativeness. To alleviate this problem, this paper proposes a novel retrieve-and-edit approach named EditSum for code summarization. Specifically, EditSum first retrieves a similar code snippet from a pre-defined corpus and treats its summary as a prototype summary to learn the pattern. Then, EditSum edits the prototype automatically to combine the pattern in the prototype with the semantic information of input code. Our motivation is that the retrieved prototype provides a good start-point for post-generation because the summaries of similar code snippets often have the same pattern. The post-editing process further reuses the patternized words in the prototype and generates keywords based on the semantic information of input code. We conduct experiments on a large-scale Java corpus and experimental results demonstrate that EditSum outperforms the state-of-the-art approaches by a substantial margin. The human evaluation also proves the summaries generated by EditSum are more informative and useful. We also verify that EditSum performs well on predicting the patternized words and keywords.

READ FULL TEXT
research
06/15/2022

An Extractive-and-Abstractive Framework for Source Code Summarization

(Source) Code summarization aims to automatically generate summaries/com...
research
09/19/2023

Revisiting and Improving Retrieval-Augmented Deep Assertion Generation

Unit testing validates the correctness of the unit under test and has be...
research
09/14/2022

Automatic Comment Generation via Multi-Pass Deliberation

Deliberation is a common and natural behavior in human daily life. For e...
research
12/04/2018

A Retrieve-and-Edit Framework for Predicting Structured Outputs

For the task of generating complex outputs such as source code, editing ...
research
09/19/2019

How to Write Summaries with Patterns? Learning towards Abstractive Summarization through Prototype Editing

Under special circumstances, summaries should conform to a particular st...
research
07/27/2021

Yet Another Combination of IR- and Neural-based Comment Generation

Code comment generation techniques aim to generate natural language desc...
research
12/21/2019

Automatically Extracting Subroutine Summary Descriptions from Unstructured Comments

Summary descriptions of subroutines are short (usually one-sentence) nat...

Please sign up or login with your details

Forgot password? Click here to reset