CommitBERT: Commit Message Generation Using Pre-Trained Programming Language Model

05/29/2021
by   Tae-Hwan Jung, et al.
0

Commit message is a document that summarizes source code changes in natural language. A good commit message clearly shows the source code changes, so this enhances collaboration between developers. Therefore, our work is to develop a model that automatically writes the commit message. To this end, we release 345K datasets consisting of code modification and commit messages in six programming languages (Python, PHP, Go, Java, JavaScript, and Ruby). Similar to the neural machine translation (NMT) model, using our dataset, we feed the code modification to the encoder input and the commit message to the decoder input and measure the result of the generated commit message with BLEU-4. Also, we propose the following two training methods to improve the result of generating the commit message: (1) A method of preprocessing the input to feed the code modification to the encoder input. (2) A method that uses an initial weight suitable for the code domain to reduce the gap in contextual representation between programming language (PL) and natural language (NL). Training code, dataset, and pre-trained weights are available at https://github.com/graykode/commit-autosuggestions

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/27/2023

Constructing Multilingual Code Search Dataset Using Neural Machine Translation

Code search is a task to find programming codes that semantically match ...
research
04/17/2017

A Neural Architecture for Generating Natural Language Descriptions from Source Code Changes

We propose a model to automatically describe changes introduced in the s...
research
04/20/2020

Incorporating External Knowledge through Pre-training for Natural Language to Code Generation

Open-domain code generation aims to generate code in a general-purpose p...
research
08/04/2018

code2seq: Generating Sequences from Structured Representations of Code

The ability to generate natural language sequences from source code snip...
research
07/14/2020

Contextualized Code Representation Learning for Commit Message Generation

Automatic generation of high-quality commit messages for code commits ca...
research
11/26/2019

Generating Commit Messages from Git Diffs

Commit messages aid developers in their understanding of a continuously ...
research
03/25/2023

Combining Contexts from Multiple Sources for Documentation-Specific Code Example Generation

Code example is a crucial part of good documentation. It helps the devel...

Please sign up or login with your details

Forgot password? Click here to reset