On Using Active Learning and Self-Training when Mining Performance Discussions on Stack Overflow

04/26/2017
by   Markus Borg, et al.
0

Abundant data is the key to successful machine learning. However, supervised learning requires annotated data that are often hard to obtain. In a classification task with limited resources, Active Learning (AL) promises to guide annotators to examples that bring the most value for a classifier. AL can be successfully combined with self-training, i.e., extending a training set with the unlabelled examples for which a classifier is the most certain. We report our experiences on using AL in a systematic manner to train an SVM classifier for Stack Overflow posts discussing performance of software components. We show that the training examples deemed as the most valuable to the classifier are also the most difficult for humans to annotate. Despite carefully evolved annotation criteria, we report low inter-rater agreement, but we also propose mitigation strategies. Finally, based on one annotator's work, we show that self-training can improve the classification accuracy. We conclude the paper by discussing implication for future text miners aspiring to use AL and self-training.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/09/2023

Active Learning for Abstractive Text Summarization

Construction of human-curated annotated datasets for abstractive text su...
research
01/04/2023

MoBYv2AL: Self-supervised Active Learning for Image Classification

Active learning(AL) has recently gained popularity for deep learning(DL)...
research
05/24/2023

Active Learning for Natural Language Generation

The field of text generation suffers from a severe shortage of labeled d...
research
01/25/2023

Toward Realistic Evaluation of Deep Active Learning Algorithms in Image Classification

Active Learning (AL) aims to reduce the labeling burden by interactively...
research
04/12/2021

Active learning for medical code assignment

Machine Learning (ML) is widely used to automatically extract meaningful...
research
09/02/2020

ALEX: Active Learning based Enhancement of a Model's Explainability

An active learning (AL) algorithm seeks to construct an effective classi...
research
11/14/2019

Coincidence, Categorization, and Consolidation: Learning to Recognize Sounds with Minimal Supervision

Humans do not acquire perceptual abilities in the way we train machines....

Please sign up or login with your details

Forgot password? Click here to reset