ComSum: Commit Messages Summarization and Meaning Preservation

08/23/2021
by   Leshem Choshen, et al.
1

We present ComSum, a data set of 7 million commit messages for text summarization. When documenting commits, software code changes, both a message and its summary are posted. We gather and filter those to curate developers' work summarization data set. Along with its growing size, practicality and challenging language domain, the data set benefits from the living field of empirical software engineering. As commits follow a typology, we propose to not only evaluate outputs by Rouge, but by their meaning preservation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/24/2022

The Westermo test results data set

There is a growing body of knowledge in the computer science, software e...
research
06/06/2017

Text Summarization using Abstract Meaning Representation

With an ever increasing size of text present on the Internet, automatic ...
research
04/05/2022

EntSUM: A Data Set for Entity-Centric Summarization

Controllable summarization aims to provide summaries that take into acco...
research
03/29/2019

A Convolutional Neural Network for Language-Agnostic Source Code Summarization

Descriptive comments play a crucial role in the software engineering pro...
research
06/14/2018

Abstract Meaning Representation for Multi-Document Summarization

Generating an abstract from a collection of documents is a desirable cap...
research
03/31/2021

HAConvGNN: Hierarchical Attention Based Convolutional Graph Neural Network for Code Documentation Generation in Jupyter Notebooks

Many data scientists use Jupyter notebook to experiment code, visualize ...

Please sign up or login with your details

Forgot password? Click here to reset