DeepAI AI Chat
Log In Sign Up

Stage-based Hyper-parameter Optimization for Deep Learning

by   Ahnjae Shin, et al.
Seoul National University

As deep learning techniques advance more than ever, hyper-parameter optimization is the new major workload in deep learning clusters. Although hyper-parameter optimization is crucial in training deep learning models for high model performance, effectively executing such a computation-heavy workload still remains a challenge. We observe that numerous trials issued from existing hyper-parameter optimization algorithms share common hyper-parameter sequence prefixes, which implies that there are redundant computations from training the same hyper-parameter sequence multiple times. We propose a stage-based execution strategy for efficient execution of hyper-parameter optimization algorithms. Our strategy removes redundancy in the training process by splitting the hyper-parameter sequences of trials into homogeneous stages, and generating a tree of stages by merging the common prefixes. Our preliminary experiment results show that applying stage-based execution to hyper-parameter optimization algorithms outperforms the original trial-based method, saving required GPU-hours and end-to-end training time by up to 6.60 times and 4.13 times, respectively.


Hippo: Taming Hyper-parameter Optimization of Deep Learning with Stage Trees

Hyper-parameter optimization is crucial for pushing the accuracy of a de...

On Hyperparameter Optimization of Machine Learning Algorithms: Theory and Practice

Machine learning algorithms have been used widely in various application...

Hyper-Parameter Optimization: A Review of Algorithms and Applications

Since deep neural networks were developed, they have made huge contribut...

Bayesian Optimization for Selecting Efficient Machine Learning Models

The performance of many machine learning models depends on their hyper-p...

Optimization-Derived Learning with Essential Convergence Analysis of Training and Hyper-training

Recently, Optimization-Derived Learning (ODL) has attracted attention fr...

Characterizing the hyper-parameter space of LSTM language models for mixed context applications

Applying state of the art deep learning models to novel real world datas...

Varuna: Scalable, Low-cost Training of Massive Deep Learning Models

Systems for training massive deep learning models (billions of parameter...