On the Tunability of Optimizers in Deep Learning

10/25/2019 ∙ by Prabhu Teja Sivaprasad, et al. ∙ 8

There is no consensus yet on the question whether adaptive gradient methods like Adam are easier to use than non-adaptive optimization methods like SGD. In this work, we fill in the important, yet ambiguous concept of `ease-of-use' by defining an optimizer's tunability: How easy is it to find good hyperparameter configurations using automatic random hyperparameter search? We propose a practical and universal quantitative measure for optimizer tunability that can form the basis for a fair optimizer benchmark. Evaluating a variety of optimizers on an extensive set of standard datasets and architectures, we find that Adam is the most tunable for the majority of problems, especially with a low budget for hyperparameter tuning.



There are no comments yet.


page 16

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.