Evaluating a bot detection model on git commit messages

03/22/2021
by   Mehdi Golzadeh, et al.
0

Detecting the presence of bots in distributed software development activity is very important in order to prevent bias in large-scale socio-technical empirical analyses. In previous work, we proposed a classification model to detect bots in GitHub repositories based on the pull request and issue comments of GitHub accounts. The current study generalises the approach to git contributors based on their commit messages. We train and evaluate the classification model on a large dataset of 6,922 git contributors. The original model based on pull request and issue comments obtained a precision of 0.77 on this dataset. Retraining the classification model on git commit messages increased the precision to 0.80. As a proof-of-concept, we implemented this model in BoDeGiC, an open source command-line tool to detect bots in git repositories.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/07/2020

A ground-truth dataset and classification model for detecting bots in GitHub issue and PR comments

Bots are frequently used in Github repositories to automate repetitive a...
research
12/08/2014

Phishing Detection in IMs using Domain Ontology and CBA - An innovative Rule Generation Approach

User ignorance towards the use of communication services like Instant Me...
research
03/10/2021

Identifying bot activity in GitHub pull request and issue comments

Development bots are used on Github to automate repetitive activities. S...
research
06/23/2022

AutoPRTitle: A Tool for Automatic Pull Request Title Generation

With the rise of the pull request mechanism in software development, the...
research
11/10/2020

A Transfer Learning Approach for Dialogue Act Classification of GitHub Issue Comments

Social coding platforms, such as GitHub, serve as laboratories for study...
research
02/17/2022

QuerTCI: A Tool Integrating GitHub Issue Querying with Comment Classification

Issue tracking systems enable users and developers to comment on problem...
research
08/23/2021

Pull Request Latency Explained: An Empirical Overview

Pull request latency evaluation is an essential application of effort ev...

Please sign up or login with your details

Forgot password? Click here to reset