Contextualized Code Representation Learning for Commit Message Generation

07/14/2020
by   Lun Yiu Nie, et al.
0

Automatic generation of high-quality commit messages for code commits can substantially facilitate developers' works and coordination. However, the semantic gap between source code and natural language poses a major challenge for the task. Several studies have been proposed to alleviate the challenge but none explicitly involves code contextual information during commit message generation. Specifically, existing research adopts static embedding for code tokens, which maps a token to the same vector regardless of its context. In this paper, we propose a novel Contextualized code representation learning method for commit message Generation (CoreGen). CoreGen first learns contextualized code representation which exploits the contextual information behind code commit sequences. The learned representations of code commits built upon Transformer are then transferred for downstream commit message generation. Experiments on the benchmark dataset demonstrate the superior effectiveness of our model over the baseline models with an improvement of 28.18 BLEU-4 score. Furthermore, we also highlight the future opportunities in training contextualized code representations on larger code corpus as a solution to low-resource settings and adapting the pretrained code representations to other downstream code-to-text generation tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/06/2019

ATOM: Commit Message Generation Based on Abstract Syntax Tree and Hybrid Ranking

Commit messages record code changes (e.g., feature modifications and bug...
research
10/11/2022

COMBO: Pre-Training Representations of Binary Code Using Contrastive Learning

Compiled software is delivered as executable binary code. Developers wri...
research
02/08/2023

CCRep: Learning Code Change Representations via Pre-Trained Code Model and Query Back

Representing code changes as numeric feature vectors, i.e., code change ...
research
05/29/2021

CommitBERT: Commit Message Generation Using Pre-Trained Programming Language Model

Commit message is a document that summarizes source code changes in natu...
research
03/05/2022

ECMG: Exemplar-based Commit Message Generation

Commit messages concisely describe the content of code diffs (i.e., code...
research
02/10/2023

CodeBERTScore: Evaluating Code Generation with Pretrained Models of Code

Since the rise of neural models of code that can generate long expressio...
research
10/03/2022

ContraGen: Effective Contrastive Learning For Causal Language Model

Despite exciting progress in large-scale language generation, the expres...

Please sign up or login with your details

Forgot password? Click here to reset