MISSION: Ultra Large-Scale Feature Selection using Count-Sketches

06/12/2018
by   Amirali Aghazadeh, et al.
0

Feature selection is an important challenge in machine learning. It plays a crucial role in the explainability of machine-driven decisions that are rapidly permeating throughout modern society. Unfortunately, the explosion in the size and dimensionality of real-world datasets poses a severe challenge to standard feature selection algorithms. Today, it is not uncommon for datasets to have billions of dimensions. At such scale, even storing the feature vector is impossible, causing most existing feature selection methods to fail. Workarounds like feature hashing, a standard approach to large-scale machine learning, helps with the computational feasibility, but at the cost of losing the interpretability of features. In this paper, we present MISSION, a novel framework for ultra large-scale feature selection that performs stochastic gradient descent while maintaining an efficient representation of the features in memory using a Count-Sketch data structure. MISSION retains the simplicity of feature hashing without sacrificing the interpretability of the features while using only O(log^2(p)) working memory. We demonstrate that MISSION accurately and efficiently performs feature selection on real-world, large-scale datasets with billions of dimensions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/26/2020

BEAR: Sketching BFGS Algorithm for Ultra-High Dimensional Feature Selection in Sublinear Memory

We consider feature selection for applications in machine learning where...
research
09/27/2014

Large-scale Online Feature Selection for Ultra-high Dimensional Sparse Data

Feature selection with large-scale high-dimensional data is important ye...
research
10/13/2016

An Information Theoretic Feature Selection Framework for Big Data under Apache Spark

With the advent of extremely high dimensional datasets, dimensionality r...
research
10/27/2021

A Self-adaptive Weighted Differential Evolution Approach for Large-scale Feature Selection

Recently, many evolutionary computation methods have been developed to s...
research
03/30/2018

Online Regression with Model Selection

Online learning algorithms have a wide variety of applications in large ...
research
04/05/2023

Selecting Features by their Resilience to the Curse of Dimensionality

Real-world datasets are often of high dimension and effected by the curs...
research
05/29/2020

Unsupervised Feature Selection via Multi-step Markov Transition Probability

Feature selection is a widely used dimension reduction technique to sele...

Please sign up or login with your details

Forgot password? Click here to reset