Training Deep Neural Networks by optimizing over nonlocal paths in hyperparameter space

09/09/2019
by   Vlad Pushkarov, et al.
0

Hyperparameter optimization is both a practical issue and an interesting theoretical problem in training of deep architectures. Despite many recent advances the most commonly used methods almost universally involve training multiple and decoupled copies of the model, in effect sampling the hyperparameter space. We show that at a negligible additional computational cost, results can be improved by sampling nonlocal paths instead of points in hyperparameter space. To this end we interpret hyperparameters as controlling the level of correlated noise in training, which can be mapped to an effective temperature. The usually independent instances of the model are coupled and allowed to exchange their hyperparameters throughout the training using the well established parallel tempering technique of statistical physics. Each simulation corresponds then to a unique path, or history, in the joint hyperparameter/model-parameter space. We provide empirical tests of our method, in particular for dropout and learning rate optimization. We observed faster training and improved resistance to overfitting and showed a systematic decrease in the absolute validation error, improving over benchmark results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/05/2018

Combination of Hyperband and Bayesian Optimization for Hyperparameter Optimization in Deep Learning

Deep learning has achieved impressive results on many problems. However,...
research
12/30/2021

Self-tuning networks:

Hyperparameter optimization can be formulated as a bilevel optimization ...
research
12/11/2019

Bayesian Hyperparameter Optimization with BoTorch, GPyTorch and Ax

Deep learning models are full of hyperparameters, which are set manually...
research
05/28/2023

A Three-regime Model of Network Pruning

Recent work has highlighted the complex influence training hyperparamete...
research
07/31/2016

Hyperparameter Transfer Learning through Surrogate Alignment for Efficient Deep Neural Network Training

Recently, several optimization methods have been successfully applied to...
research
11/28/2022

Mathematically Modeling the Lexicon Entropy of Emergent Language

We formulate a stochastic process, FiLex, as a mathematical model of lex...
research
10/18/2019

Fully Parallel Hyperparameter Search: Reshaped Space-Filling

Space-filling designs such as scrambled-Hammersley, Latin Hypercube Samp...

Please sign up or login with your details

Forgot password? Click here to reset