Classifier Calibration: with implications to threat scores in cybersecurity

02/09/2021
by   Waleed A. Yousef, et al.
0

This paper explores the calibration of a classifier output score in binary classification problems. A calibrator is a function that maps the arbitrary classifier score, of a testing observation, onto [0,1] to provide an estimate for the posterior probability of belonging to one of the two classes. Calibration is important for two reasons; first, it provides a meaningful score, that is the posterior probability; second, it puts the scores of different classifiers on the same scale for comparable interpretation. The paper presents three main contributions: (1) Introducing multi-score calibration, when more than one classifier provides a score for a single observation. (2) Introducing the idea that the classifier scores to a calibration process are nothing but features to a classifier, hence proposing extending the classifier scores to higher dimensions to boost the calibrator's performance. (3) Conducting a massive simulation study, in the order of 24,000 experiments, that incorporates different configurations, in addition to experimenting on two real datasets from the cybersecurity domain. The results show that there is no overall winner among the different calibrators and different configurations. However, general advices for practitioners include the following: the Platt's calibrator <cit.>, a version of the logistic regression that decreases bias for a small sample size, has a very stable and acceptable performance among all experiments; our suggested multi-score calibration provides better performance than single score calibration in the majority of experiments, including the two real datasets. In addition, extending the scores can help in some experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/15/2021

Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification

Modern machine learning models with high accuracy are often miscalibrate...
research
10/24/2017

Calibration of Machine Learning Classifiers for Probability of Default Modelling

Binary classification is highly used in credit scoring in the estimation...
research
09/12/2019

A Note on Posterior Probability Estimation for Classifiers

One of the central themes in the classification task is the estimation o...
research
04/18/2022

Trinary Tools for Continuously Valued Binary Classifiers

Classification methods for binary (yes/no) tasks often produce a continu...
research
04/18/2021

Tutorial on logistic-regression calibration and fusion: Converting a score to a likelihood ratio

Logistic-regression calibration and fusion are potential steps in the ca...
research
03/28/2022

Investigation of Different Calibration Methods for Deep Speaker Embedding based Verification Systems

Deep speaker embedding extractors have already become new state-of-the-a...
research
09/21/2019

Using theoretical ROC curves for analysing machine learning binary classifiers

Most binary classifiers work by processing the input to produce a scalar...

Please sign up or login with your details

Forgot password? Click here to reset