Processing Large Datasets of Fined Grained Source Code Changes

10/20/2019
by   Stanislav Levin, et al.
0

In the era of Big Code, when researchers seek to study an increasingly large number of repositories to support their findings, the data processing stage may require manipulating millions and more of records. In this work we focus on studies involving fine-grained AST level source code changes. We present how we extended the CodeDistillery source code mining framework with data manipulation capabilities, aimed to alleviate the processing of large datasets of fine grained source code changes. The capabilities we have introduced allow researchers to highly automate their repository mining process and streamline the data acquisition and processing phases. These capabilities have been successfully used to conduct a number of studies, in the course of which dozens of millions of fine-grained source code changes have been processed.

READ FULL TEXT
research
09/09/2019

A Systematic Review on Learning and Suggesting Source Code Changes in Version History

Software systems are in continuous evolution through source code changes...
research
03/31/2020

ChangeBeadsThreader: An Interactive Environment for Tailoring Automatically Untangled Changes

To improve the usability of a revision history, change untangling, which...
research
04/19/2021

DepMiner: A Pipelineable Tool for Mining of Intra-Project Dependencies

Dependency analysis is recognized as an important field of software engi...
research
04/04/2019

Neural Networks for Modeling Source Code Edits

Programming languages are emerging as a challenging and interesting doma...
research
10/14/2015

Fine-Grained Energy Modeling for the Source Code of a Mobile Application

Energy efficiency has a significant influence on user experience of batt...
research
03/23/2019

V2CNet: A Deep Learning Framework to Translate Videos to Commands for Robotic Manipulation

We propose V2CNet, a new deep learning framework to automatically transl...
research
06/21/2023

Lightweight learning from label proportions on satellite imagery

This work addresses the challenge of producing chip level predictions on...

Please sign up or login with your details

Forgot password? Click here to reset