A Novel Data Pre-processing Technique: Making Data Mining Robust to Different Units and Scales of Measurement

11/08/2021
by   Arbind Agrahari Baniya, et al.
0

Many existing data mining algorithms use feature values directly in their model, making them sensitive to units/scales used to measure/represent data. Pre-processing of data based on rank transformation has been suggested as a potential solution to overcome this issue. However, the resulting data after pre-processing with rank transformation is uniformly distributed, which may not be very useful in many data mining applications. In this paper, we present a better and effective alternative based on ranks over multiple sub-samples of data. We call the proposed pre-processing technique as ARES | Average Rank over an Ensemble of Sub-samples. Our empirical results of widely used data mining algorithms for classification and anomaly detection in a wide range of data sets suggest that ARES results in more consistent task specific? outcome across various algorithms and data sets. In addition to this, it results in better or competitive outcome most of the time compared to the most widely used min-max normalisation and the traditional rank transformation.

READ FULL TEXT
research
07/14/2000

Integrating E-Commerce and Data Mining: Architecture and Challenges

We show that the e-commerce domain can provide all the right ingredients...
research
06/20/2019

Preprocessing Methods and Pipelines of Data Mining: An Overview

Data mining is about obtaining new knowledge from existing datasets. How...
research
05/18/2008

Symmetry in Data Mining and Analysis: A Unifying View based on Hierarchy

Data analysis and data mining are concerned with unsupervised pattern fi...
research
05/15/2020

Convolutional neural networks for classification and regression analysis of one-dimensional spectral data

Convolutional neural networks (CNNs) are widely used for image recogniti...
research
06/17/2020

Faster Secure Data Mining via Distributed Homomorphic Encryption

Due to the rising privacy demand in data mining, Homomorphic Encryption ...
research
12/10/2018

Automatic Classification of Knee Rehabilitation Exercises Using a Single Inertial Sensor: a Case Study

Inertial measurement units have the ability to accurately record the acc...
research
03/23/2020

Wise Sliding Window Segmentation: A classification-aided approach for trajectory segmentation

Large amounts of mobility data are being generated from many different s...

Please sign up or login with your details

Forgot password? Click here to reset