Matrix sketching for supervised classification with imbalanced classes

12/02/2019
by   Roberta Falcone, et al.
0

Matrix sketching is a recently developed data compression technique. An input matrix A is efficiently approximated with a smaller matrix B, so that B preserves most of the properties of A up to some guaranteed approximation ratio. In so doing numerical operations on big data sets become faster. Sketching algorithms generally use random projections to compress the original dataset and this stochastic generation process makes them amenable to statistical analysis. The statistical properties of sketching algorithms have been widely studied in the context of multiple linear regression. In this paper we propose matrix sketching as a tool for rebalancing class sizes in supervised classification with imbalanced classes. It is well-known in fact that class imbalance may lead to poor classification performances especially as far as the minority class is concerned.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/05/2021

Statistical Theory for Imbalanced Binary Classification

Within the vast body of statistical theory developed for binary classifi...
research
04/06/2021

Survey of Imbalanced Data Methodologies

Imbalanced data set is a problem often found and well-studied in financi...
research
10/09/2022

An Instance Selection Algorithm for Big Data in High imbalanced datasets based on LSH

Training of Machine Learning (ML) models in real contexts often deals wi...
research
09/08/2019

Self-paced Ensemble for Highly Imbalanced Massive Data Classification

Many real-world applications reveal difficulties in learning classifiers...
research
09/08/2019

Training Effective Ensemble on Imbalanced Data by Self-paced Harmonizing Classification Hardness

Many real-world applications reveal difficulties in learning classifiers...
research
11/05/2021

Divide-and-Conquer Hard-thresholding Rules in High-dimensional Imbalanced Classification

In binary classification, imbalance refers to situations in which one cl...
research
11/25/2019

A Self-Adaptive Synthetic Over-Sampling Technique for Imbalanced Classification

Traditionally, in supervised machine learning, (a significant) part of t...

Please sign up or login with your details

Forgot password? Click here to reset