A Framework for Cluster and Classifier Evaluation in the Absence of Reference Labels

09/23/2021
by   Robert J. Joyce, et al.
0

In some problem spaces, the high cost of obtaining ground truth labels necessitates use of lower quality reference datasets. It is difficult to benchmark model performance using these datasets, as evaluation results may be biased. We propose a supplement to using reference labels, which we call an approximate ground truth refinement (AGTR). Using an AGTR, we prove that bounds on specific metrics used to evaluate clustering algorithms and multi-class classifiers can be computed without reference labels. We also introduce a procedure that uses an AGTR to identify inaccurate evaluation results produced from datasets of dubious quality. Creating an AGTR requires domain knowledge, and malware family classification is a task with robust domain knowledge approaches that support the construction of an AGTR. We demonstrate our AGTR evaluation framework by applying it to a popular malware labeling tool to diagnose over-fitting in prior testing and evaluate changes whose impact could not be meaningfully quantified under previous data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/02/2019

MalPaCA: Malware Packet Sequence Clustering and Analysis

Malware family characterization is a challenging problem because ground-...
research
10/22/2020

Malware Traffic Classification: Evaluation of Algorithms and an Automated Ground-truth Generation Pipeline

Identifying threats in a network traffic flow which is encrypted is uniq...
research
09/24/2018

Statistical Estimation of Malware Detection Metrics in the Absence of Ground Truth

The accurate measurement of security metrics is a critical research prob...
research
12/02/2022

Evaluation of FEM and MLFEM AI-explainers in Image Classification tasks with reference-based and no-reference metrics

The most popular methods and algorithms for AI are, for the vast majorit...
research
10/08/2012

Semisupervised Classifier Evaluation and Recalibration

How many labeled examples are needed to estimate a classifier's performa...
research
08/25/2021

Applying Semi-Automated Hyperparameter Tuning for Clustering Algorithms

When approaching a clustering problem, choosing the right clustering alg...
research
02/12/2022

Detecting False Alarms from Automatic Static Analysis Tools: How Far are We?

Automatic static analysis tools (ASATs), such as Findbugs, have a high f...

Please sign up or login with your details

Forgot password? Click here to reset