Prudence When Assuming Normality: an advice for machine learning practitioners

07/30/2019
by   Waleed A. Yousef, et al.
3

In a binary classification problem the feature vector (predictor) is the input to a scoring function that produces a decision value (score), which is compared to a particular chosen threshold to provide a final class prediction (output). Although the normal assumption of the scoring function is important in many applications, sometimes it is severely violated even under the simple multinormal assumption of the feature vector. This article proves this result mathematically with a counter example to provide an advice for practitioners to avoid blind assumptions of normality. On the other hand, the article provides a set of experiments that illustrate some of the expected and well-behaved results of the Area Under the ROC curve (AUC) under the multinormal assumption of the feature vector. Therefore, the message of the article is not to avoid the normal assumption of either the input feature vector or the output scoring function; however, a prudence is needed when adopting either of both.

READ FULL TEXT

page 5

page 6

research
06/13/2018

Partial AUC Maximization via Nonlinear Scoring Functions

We propose a method for maximizing a partial area under a receiver opera...
research
02/28/2020

UKARA 1.0 Challenge Track 1: Automatic Short-Answer Scoring in Bahasa Indonesia

We describe our third-place solution to the UKARA 1.0 challenge on autom...
research
04/30/2019

Encoding Categorical Variables with Conjugate Bayesian Models for WeWork Lead Scoring Engine

Applied Data Scientists throughout various industries are commonly faced...
research
11/22/2019

Responsible Scoring Mechanisms Through Function Sampling

Human decision-makers often receive assistance from data-driven algorith...
research
09/20/2021

Learning to Rank Anomalies: Scalar Performance Criteria and Maximization of Two-Sample Rank Statistics

The ability to collect and store ever more massive databases has been ac...
research
10/07/2014

PAC-Bayesian AUC classification and scoring

We develop a scoring and classification procedure based on the PAC-Bayes...

Please sign up or login with your details

Forgot password? Click here to reset