Mining Software Repositories with a Collaborative Heuristic Repository

by   Hlib Babii, et al.

Many software engineering studies or tasks rely on categorizing software engineering artifacts. In practice, this is done either by defining simple but often imprecise heuristics, or by manual labelling of the artifacts. Unfortunately, errors in these categorizations impact the tasks that rely on them. To improve the precision of these categorizations, we propose to gather heuristics in a collaborative heuristic repository, to which researchers can contribute a large amount of diverse heuristics for a variety of tasks on a variety of SE artifacts. These heuristics are then leveraged by state-of-the-art weak supervision techniques to train high-quality classifiers, thus improving the categorizations. We present an initial version of the heuristic repository, which we applied to the concrete task of commit classification.



page 1

page 2

page 3

page 4


The SmartSHARK Ecosystem for Software Repository Mining

Software repository mining is the foundation for many empirical software...

Data-Driven Search-based Software Engineering

This paper introduces Data-Driven Search-based Software Engineering (DSE...

Software Engineering for Robotic Systems:a systematic mapping study

Robots are being applied in a vast range of fields, leading researchers ...

PROMETHEUS: PROcedural METhodology for developing HEuristics of USability

Usability is used to assess the effectiveness of a software product from...

Searching for Relevant Lessons Learned Using Hybrid Information Retrieval Classifiers: A Case Study in Software Engineering

The lessons learned (LL) repository is one of the most valuable sources ...

Toward Human-Like Summaries Generated from Heterogeneous Software Artefacts

Automatic text summarisation has drawn considerable interest in the fiel...

Repository for Reusing Artifacts of Artificial Neural Networks

Artificial Neural Networks (ANNs) replaced conventional software systems...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.