Evaluating Software User Feedback Classifiers on Unseen Apps, Datasets, and Metadata

12/27/2021
by   Peter Devine, et al.
0

Listening to user's requirements is crucial to building and maintaining high quality software. Online software user feedback has been shown to contain large amounts of information useful to requirements engineering (RE). Previous studies have created machine learning classifiers for parsing this feedback for development insight. While these classifiers report generally good performance when evaluated on a test set, questions remain as to how well they extend to unseen data in various forms. This study evaluates machine learning classifiers performance on feedback for two common classification tasks (classifying bug reports and feature requests). Using seven datasets from prior research studies, we investigate the performance of classifiers when evaluated on feedback from different apps than those contained in the training set and when evaluated on completely different datasets (coming from different feedback platforms and/or labelled by different researchers). We also measure the difference in performance of using platform-specific metadata as a feature in classification. We demonstrate that classification performance is similar on feedback from unseen apps compared to seen apps in the majority of cases tested. However, the classifiers do not perform well on unseen datasets. We show that multi-dataset training or zero shot classification approaches can somewhat mitigate this performance decrease. Finally, we find that using metadata as features in classifying bug reports and feature requests does not lead to a statistically significant improvement in the majority of datasets tested. We discuss the implications of these results on developing user feedback classification models to analyse and extract software requirements.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/14/2021

Automatically Matching Bug Reports With Related App Reviews

App stores allow users to give valuable feedback on apps, and developers...
research
09/12/2019

Classifying Multilingual User Feedback using Traditional Machine Learning and Deep Learning

With the rise of social media like Twitter and of software distribution ...
research
09/18/2017

Bug or Not? Bug Report Classification Using N-Gram IDF

Previous studies have found that a significant number of bug reports are...
research
11/20/2017

A generalised framework for detailed classification of swimming paths inside the Morris Water Maze

The Morris Water Maze is commonly used in behavioural neuroscience for t...
research
12/19/2021

Early Detection of Security-Relevant Bug Reports using Machine Learning: How Far Are We?

Bug reports are common artefacts in software development. They serve as ...
research
07/26/2023

Mining Reddit Data to Elicit Students' Requirements During COVID-19 Pandemic

Data-driven requirements engineering leverages the abundance of openly a...
research
09/07/2022

SZZ in the time of Pull Requests

In the multi-commit development model, programmers complete tasks (e.g.,...

Please sign up or login with your details

Forgot password? Click here to reset