Code to Comment Translation: A Comparative Study on Model Effectiveness Errors

06/15/2021
by   Junayed Mahmud, et al.
0

Automated source code summarization is a popular software engineering research topic wherein machine translation models are employed to "translate" code snippets into relevant natural language descriptions. Most evaluations of such models are conducted using automatic reference-based metrics. However, given the relatively large semantic gap between programming languages and natural language, we argue that this line of research would benefit from a qualitative investigation into the various error modes of current state-of-the-art models. Therefore, in this work, we perform both a quantitative and qualitative comparison of three recently proposed source code summarization models. In our quantitative evaluation, we compare the models based on the smoothed BLEU-4, METEOR, and ROUGE-L machine translation metrics, and in our qualitative evaluation, we perform a manual open-coding of the most common errors committed by the models when compared to ground truth captions. Our investigation reveals new insights into the relationship between metric-based performance and model prediction errors grounded in an empirically derived error taxonomy that can be used to drive future research efforts

READ FULL TEXT
research
04/04/2022

Semantic Similarity Metrics for Evaluating Source Code Summarization

Source code summarization involves creating brief descriptions of source...
research
05/23/2022

Summarize and Generate to Back-translate: Unsupervised Translation of Programming Languages

Back-translation is widely known for its effectiveness for neural machin...
research
01/03/2023

An Empirical Investigation into the Use of Image Captioning for Automated Software Documentation

Existing automated techniques for software documentation typically attem...
research
10/03/2020

Code to Comment "Translation": Data, Metrics, Baselining Evaluation

The relationship of comments to code, and in particular, the task of gen...
research
06/19/2019

Automatic Source Code Summarization with Extended Tree-LSTM

Neural machine translation models are used to automatically generate a d...
research
06/12/2019

Does BLEU Score Work for Code Migration?

Statistical machine translation (SMT) is a fast-growing sub-field of com...
research
12/12/2022

Who Evaluates the Evaluators? On Automatic Metrics for Assessing AI-based Offensive Code Generators

AI-based code generators are an emerging solution for automatically writ...

Please sign up or login with your details

Forgot password? Click here to reset