Random Subspace with Trees for Feature Selection Under Memory Constraints

09/04/2017
by   Antonio Sutera, et al.
0

Dealing with datasets of very high dimension is a major challenge in machine learning. In this paper, we consider the problem of feature selection in applications where the memory is not large enough to contain all features. In this setting, we propose a novel tree-based feature selection approach that builds a sequence of randomized trees on small subsamples of variables mixing both variables already identified as relevant by previous models and variables randomly selected among the other variables. As our main contribution, we provide an in-depth theoretical analysis of this method in infinite sample setting. In particular, we study its soundness with respect to common definitions of feature relevance and its convergence speed under various variable dependance scenarios. We also provide some preliminary empirical results highlighting the potential of the approach.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/07/2012

Feature Selection via Regularized Trees

We propose a tree regularization framework, which enables many tree mode...
research
05/12/2016

Context-dependent feature analysis with random forests

In many cases, feature selection is often more complicated than identify...
research
08/15/2019

Maximum Relevance and Minimum Redundancy Feature Selection Methods for a Marketing Machine Learning Platform

In machine learning applications for online product offerings and market...
research
06/08/2015

Interpretable Selection and Visualization of Features and Interactions Using Bayesian Forests

It is becoming increasingly important for machine learning methods to ma...
research
06/24/2016

Regression Trees and Random forest based feature selection for malaria risk exposure prediction

This paper deals with prediction of anopheles number, the main vector of...
research
02/10/2014

Feature and Variable Selection in Classification

The amount of information in the form of features and variables avail- a...
research
04/28/2023

A feature selection method based on Shapley values robust to concept shift in regression

Feature selection is one of the most relevant processes in any methodolo...

Please sign up or login with your details

Forgot password? Click here to reset