Evaluating Commit Message Generation: To BLEU Or Not To BLEU?

04/20/2022
by   Samanta Dey, et al.
0

Commit messages play an important role in several software engineering tasks such as program comprehension and understanding program evolution. However, programmers neglect to write good commit messages. Hence, several Commit Message Generation (CMG) tools have been proposed. We observe that the recent state of the art CMG tools use simple and easy to compute automated evaluation metrics such as BLEU4 or its variants. The advances in the field of Machine Translation (MT) indicate several weaknesses of BLEU4 and its variants. They also propose several other metrics for evaluating Natural Language Generation (NLG) tools. In this work, we discuss the suitability of various MT metrics for the CMG task. Based on the insights from our experiments, we propose a new variant specifically for evaluating the CMG task. We re-evaluate the state of the art CMG tools on our new metric. We believe that our work fixes an important gap that exists in the understanding of evaluation metrics for CMG research.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/12/2021

On the Evaluation of Commit Message Generation Models: An Experimental Study

Commit messages are natural language descriptions of code changes, which...
research
04/10/2023

DISTO: Evaluating Textual Distractors for Multi-Choice Questions using Negative Sampling based Approach

Multiple choice questions (MCQs) are an efficient and common way to asse...
research
12/13/2022

Towards trustworthy phoneme boundary detection with autoregressive model and improved evaluation metric

Phoneme boundary detection has been studied due to its central role in v...
research
11/09/2022

HilMeMe: A Human-in-the-Loop Machine Translation Evaluation Metric Looking into Multi-Word Expressions

With the fast development of Machine Translation (MT) systems, especiall...
research
05/23/2023

Ties Matter: Modifying Kendall's Tau for Modern Metric Meta-Evaluation

Kendall's tau is frequently used to meta-evaluate how well machine trans...
research
10/25/2022

DEMETR: Diagnosing Evaluation Metrics for Translation

While machine translation evaluation metrics based on string overlap (e....
research
09/22/2020

CodeBLEU: a Method for Automatic Evaluation of Code Synthesis

Evaluation metrics play a vital role in the growth of an area as it defi...

Please sign up or login with your details

Forgot password? Click here to reset