Using Feature Grouping as a Stochastic Regularizer for High-Dimensional Noisy Data

07/31/2018
by   Sergul Aydore, et al.
16

The use of complex models --with many parameters-- is challenging with high-dimensional small-sample problems: indeed, they face rapid overfitting. Such situations are common when data collection is expensive, as in neuroscience, biology, or geology. Dedicated regularization can be crafted to tame overfit, typically via structured penalties. But rich penalties require mathematical expertise and entail large computational costs. Stochastic regularizers such as dropout are easier to implement: they prevent overfitting by random perturbations. Used inside a stochastic optimizer, they come with little additional cost. We propose a structured stochastic regularization that relies on feature grouping. Using a fast clustering algorithm, we define a family of groups of features that capture feature covariations. We then randomly select these groups inside a stochastic gradient descent loop. This procedure acts as a structured regularizer for high-dimensional correlated data without additional computational cost and it has a denoising effect. We demonstrate the performance of our approach for logistic regression both on a sample-limited face image dataset with varying additive noise and on a typical high-dimensional learning problem, brain image classification.

READ FULL TEXT
research
05/22/2018

Adversarially Robust Training through Structured Gradient Regularization

We propose a novel data-dependent structured gradient regularizer to inc...
research
04/08/2023

Benign Overfitting of Non-Sparse High-Dimensional Linear Regression with Correlated Noise

We investigate the high-dimensional linear regression problem in situati...
research
07/04/2013

Dropout Training as Adaptive Regularization

Dropout and other feature noising schemes control overfitting by artific...
research
09/24/2013

Solving OSCAR regularization problems by proximal splitting algorithms

The OSCAR (octagonal selection and clustering algorithm for regression) ...
research
02/07/2020

DropCluster: A structured dropout for convolutional networks

Dropout as a regularizer in deep neural networks has been less effective...
research
03/29/2020

High-dimensional Neural Feature using Rectified Linear Unit and Random Matrix Instance

We design a ReLU-based multilayer neural network to generate a rich high...
research
07/22/2018

PaloBoost: An Overfitting-robust TreeBoost with Out-of-Bag Sample Regularization Techniques

Stochastic Gradient TreeBoost is often found in many winning solutions i...

Please sign up or login with your details

Forgot password? Click here to reset