Optimizing generalization on the train set: a novel gradient-based framework to train parameters and hyperparameters simultaneously

06/11/2020
by   Karim Lounici, et al.
0

Generalization is a central problem in Machine Learning. Most prediction methods require careful calibration of hyperparameters carried out on a hold-out validation dataset to achieve generalization. The main goal of this paper is to present a novel approach based on a new measure of risk that allows us to develop novel fully automatic procedures for generalization. We illustrate the pertinence of this new framework in the regression problem. The main advantages of this new approach are: (i) it can simultaneously train the model and perform regularization in a single run of a gradient-based optimizer on all available data without any previous hyperparameter tuning; (ii) this framework can tackle several additional objectives simultaneously (correlation, sparsity,...) via the introduction of regularization parameters. Noticeably, our approach transforms hyperparameter tuning as well as feature selection (a combinatorial discrete optimization problem) into a continuous optimization problem that is solvable via classical gradient-based methods ; (iii) the computational complexity of our methods is O(npK) where n,p,K denote respectively the number of observations, features and iterations of the gradient descent algorithm. We observe in our experiments a significantly smaller runtime for our methods as compared to benchmark methods for equivalent prediction score. Our procedures are implemented in PyTorch (code is available for replication).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/17/2021

Muddling Labels for Regularization, a novel approach to generalization

Generalization is a central problem in Machine Learning. Indeed most pre...
research
09/29/2019

Gradient Descent: The Ultimate Optimizer

Working with any gradient-based machine learning algorithm involves the ...
research
11/30/2021

Adaptive Optimization with Examplewise Gradients

We propose a new, more general approach to the design of stochastic grad...
research
03/06/2017

Forward and Reverse Gradient-Based Hyperparameter Optimization

We study two procedures (reverse-mode and forward-mode) for computing th...
research
06/22/2021

Kernel Clustering with Sigmoid-based Regularization for Efficient Segmentation of Sequential Data

Kernel segmentation aims at partitioning a data sequence into several no...
research
06/08/2021

Stability and Generalization of Bilevel Programming in Hyperparameter Optimization

Recently, the (gradient-based) bilevel programming framework is widely u...
research
08/21/2023

Relax and penalize: a new bilevel approach to mixed-binary hyperparameter optimization

In recent years, bilevel approaches have become very popular to efficien...

Please sign up or login with your details

Forgot password? Click here to reset