Evaluating a bot detection model on git commit messages

03/22/2021
by   Mehdi Golzadeh, et al.
0

Detecting the presence of bots in distributed software development activity is very important in order to prevent bias in large-scale socio-technical empirical analyses. In previous work, we proposed a classification model to detect bots in GitHub repositories based on the pull request and issue comments of GitHub accounts. The current study generalises the approach to git contributors based on their commit messages. We train and evaluate the classification model on a large dataset of 6,922 git contributors. The original model based on pull request and issue comments obtained a precision of 0.77 on this dataset. Retraining the classification model on git commit messages increased the precision to 0.80. As a proof-of-concept, we implemented this model in BoDeGiC, an open source command-line tool to detect bots in git repositories.

READ FULL TEXT

Authors

page 1

page 2

page 3

page 4

10/07/2020

A ground-truth dataset and classification model for detecting bots in GitHub issue and PR comments

Bots are frequently used in Github repositories to automate repetitive a...
12/08/2014

Phishing Detection in IMs using Domain Ontology and CBA - An innovative Rule Generation Approach

User ignorance towards the use of communication services like Instant Me...
03/10/2021

Identifying bot activity in GitHub pull request and issue comments

Development bots are used on Github to automate repetitive activities. S...
11/10/2020

A Transfer Learning Approach for Dialogue Act Classification of GitHub Issue Comments

Social coding platforms, such as GitHub, serve as laboratories for study...
06/23/2022

AutoPRTitle: A Tool for Automatic Pull Request Title Generation

With the rise of the pull request mechanism in software development, the...
02/17/2022

QuerTCI: A Tool Integrating GitHub Issue Querying with Comment Classification

Issue tracking systems enable users and developers to comment on problem...
05/28/2021

Pull Request Decision Explained: An Empirical Overview

Context: Pull-based development model is widely used in open source, lea...

Code Repositories

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.