The choice of scaling technique matters for classification performance

12/23/2022
by   Lucas B. V. de Amorim, et al.
0

Dataset scaling, also known as normalization, is an essential preprocessing step in a machine learning pipeline. It is aimed at adjusting attributes scales in a way that they all vary within the same range. This transformation is known to improve the performance of classification models, but there are several scaling techniques to choose from, and this choice is not generally done carefully. In this paper, we execute a broad experiment comparing the impact of 5 scaling techniques on the performances of 20 classification algorithms among monolithic and ensemble models, applying them to 82 publicly available datasets with varying imbalance ratios. Results show that the choice of scaling technique matters for classification performance, and the performance difference between the best and the worst scaling technique is relevant and statistically significant in most cases. They also indicate that choosing an inadequate technique can be more detrimental to classification performance than not scaling the data at all. We also show how the performance variation of an ensemble model, considering different scaling techniques, tends to be dictated by that of its base model. Finally, we discuss the relationship between a model's sensitivity to the choice of scaling technique and its performance and provide insights into its applicability on different model deployment scenarios. Full results and source code for the experiments in this paper are available in a GitHub repository.[https://github.com/amorimlb/scaling_matters]

READ FULL TEXT
research
08/01/2022

Weighted Scaling Approach for Metabolomics Data Analysis

Systematic variation is a common issue in metabolomics data analysis. Th...
research
12/14/2022

Reproducible scaling laws for contrastive language-image learning

Scaling up neural networks has led to remarkable performance across a wi...
research
07/15/2022

ScaleNet: Searching for the Model to Scale

Recently, community has paid increasing attention on model scaling and c...
research
05/19/2022

CLCNet: Rethinking of Ensemble Modeling with Classification Confidence Network

In this paper, we propose a Classification Confidence Network (CLCNet) t...
research
12/28/2020

Adaptive Threshold for Better Performance of the Recognition and Re-identification Models

Choosing a decision threshold is one of the challenging job in any class...
research
03/25/2023

Ensemble-based Blackbox Attacks on Dense Prediction

We propose an approach for adversarial attacks on dense prediction model...
research
06/12/2020

Reinforced Data Sampling for Model Diversification

With the rising number of machine learning competitions, the world has w...

Please sign up or login with your details

Forgot password? Click here to reset