Incivility Detection in Open Source Code Review and Issue Discussions

06/27/2022
by   Isabella Ferreira, et al.
0

Given the democratic nature of open source development, code review and issue discussions may be uncivil. Incivility, defined as features of discussion that convey an unnecessarily disrespectful tone, can have negative consequences to open source communities. To prevent or minimize these negative consequences, open source platforms have included mechanisms for removing uncivil language from the discussions. However, such approaches require manual inspection, which can be overwhelming given the large number of discussions. To help open source communities deal with this problem, in this paper, we aim to compare six classical machine learning models with BERT to detect incivility in open source code review and issue discussions. Furthermore, we assess if adding contextual information improves the models' performance and how well the models perform in a cross-platform setting. We found that BERT performs better than classical machine learning models, with a best F1-score of 0.95. Furthermore, classical machine learning models tend to underperform to detect non-technical and civil discussions. Our results show that adding the contextual information to BERT did not improve its performance and that none of the analyzed classifiers had an outstanding performance in a cross-platform setting. Finally, we provide insights into the tones that the classifiers misclassify.

READ FULL TEXT

page 10

page 11

page 12

page 13

page 14

page 15

page 16

research
07/30/2022

Adding Context to Source Code Representations for Deep Learning

Deep learning models have been successfully applied to a variety of soft...
research
07/22/2023

CloudScent: a model for code smell analysis in open-source cloud

The low cost and rapid provisioning capabilities have made open-source c...
research
07/08/2021

Data-Driven Extract Method Recommendations: A Study at ING

The sound identification of refactoring opportunities is still an open p...
research
06/07/2018

A Simple NLP-based Approach to Support Onboarding and Retention in Open-Source Communities

Successful open source communities are constantly looking for members an...
research
01/26/2022

LAGOON: An Analysis Tool for Open Source Communities

This paper presents LAGOON – an open source platform for understanding t...
research
03/08/2017

Assessing Code Authorship: The Case of the Linux Kernel

Code authorship is a key information in large-scale open source systems....
research
05/29/2020

OSDG – Open-Source Approach to Classify Text Data by UN Sustainable Development Goals (SDGs)

Sustainable Development Goals (SDGs) bring together the diverse developm...

Please sign up or login with your details

Forgot password? Click here to reset