An Empirical Study on Using Large Language Models for Multi-Intent Comment Generation

04/22/2023
by   Mingyang Geng, et al.
0

Code comment generation aims at generating natural language descriptions for a code snippet to facilitate developers' program comprehension activities. Despite being studied for a long time, a bottleneck for existing approaches is that given a code snippet, they can only generate one comment while developers usually need to know information from diverse perspectives such as what is the functionality of this code snippet and how to use it. To tackle this limitation, this study empirically investigates the feasibility of utilizing large language models (LLMs) to generate comments that can fulfill developers' diverse intents. Our intuition is based on the facts that (1) the code and its pairwise comment are used during the pre-training process of LLMs to build the semantic connection between the natural language and programming language, and (2) comments in the real-world projects, which are collected for the pre-training, usually contain different developers' intents. We thus postulate that the LLMs can already understand the code from different perspectives after the pre-training. Indeed, experiments on two large-scale datasets demonstrate the rationale of our insights: by adopting the in-context learning paradigm and giving adequate prompts to the LLM (e.g., providing it with ten or more examples), the LLM can significantly outperform a state-of-the-art supervised learning approach on generating comments with multiple intents. Results also show that customized strategies for constructing the prompts and post-processing strategies for reranking the results can both boost the LLM's performances, which shed light on future research directions for using LLMs to achieve comment generation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/27/2022

Reasoning Like Program Executors

Reasoning over natural language is a long-standing goal for the research...
research
09/12/2023

Unveiling the potential of large language models in generating semantic and cross-language clones

Semantic and Cross-language code clone generation may be useful for code...
research
09/22/2017

Code Attention: Translating Code to Comments by Exploiting Domain Features

Appropriate comments of code snippets provide insight for code functiona...
research
03/17/2022

CodeReviewer: Pre-Training for Automating Code Review Activities

Code review is an essential part to software development lifecycle since...
research
02/14/2023

Developer-Intent Driven Code Comment Generation

Existing automatic code comment generators mainly focus on producing a g...
research
08/03/2023

Comparing scalable strategies for generating numerical perspectives

Numerical perspectives help people understand extreme and unfamiliar num...
research
08/24/2022

Diverse Title Generation for Stack Overflow Posts with Multiple Sampling Enhanced Transformer

Stack Overflow is one of the most popular programming communities where ...

Please sign up or login with your details

Forgot password? Click here to reset