Semantic Similarity Loss for Neural Source Code Summarization

08/14/2023
by   Chia-Yi Su, et al.
0

This paper presents an improved loss function for neural source code summarization. Code summarization is the task of writing natural language descriptions of source code. Neural code summarization refers to automated techniques for generating these descriptions using neural networks. Almost all current approaches involve neural networks as either standalone models or as part of a pretrained large language models e.g., GPT, Codex, LLaMA. Yet almost all also use a categorical cross-entropy (CCE) loss function for network optimization. Two problems with CCE are that 1) it computes loss over each word prediction one-at-a-time, rather than evaluating a whole sentence, and 2) it requires a perfect prediction, leaving no room for partial credit for synonyms. We propose and evaluate a loss function to alleviate this problem. In essence, we propose to use a semantic similarity metric to calculate loss over the whole output sentence prediction per training batch, rather than just loss for each word. We also propose to combine our loss with traditional CCE for each word, which streamlines the training process compared to baselines. We evaluate our approach over several baselines and report an improvement in the vast majority of conditions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/04/2022

Semantic Similarity Metrics for Evaluating Source Code Summarization

Source code summarization involves creating brief descriptions of source...
research
05/16/2023

Towards Modeling Human Attention from Eye Movements for Neural Source Code Summarization

Neural source code summarization is the task of generating natural langu...
research
01/07/2021

Action Word Prediction for Neural Source Code Summarization

Source code summarization is the task of creating short, natural languag...
research
03/04/2023

Demystifying What Code Summarization Models Learned

Study patterns that models have learned has long been a focus of pattern...
research
03/28/2023

Label Smoothing Improves Neural Source Code Summarization

Label smoothing is a regularization technique for neural networks. Norma...
research
09/16/2019

Automatic Generation of Pull Request Descriptions

Enabled by the pull-based development model, developers can easily contr...
research
02/11/2016

Variations of the Similarity Function of TextRank for Automated Summarization

This article presents new alternatives to the similarity function for th...

Please sign up or login with your details

Forgot password? Click here to reset