A Generalizable Approach to Learning Optimizers

06/02/2021 ∙ by Diogo Almeida, et al. ∙ 15

A core issue with learning to optimize neural networks has been the lack of generalization to real world problems. To address this, we describe a system designed from a generalization-first perspective, learning to update optimizer hyperparameters instead of model parameters directly using novel features, actions, and a reward function. This system outperforms Adam at all neural network tasks including on modalities not seen during training. We achieve 2x speedups on ImageNet, and a 2.5x speedup on a language modeling task using over 5 orders of magnitude more compute than the training tasks.



There are no comments yet.


page 17

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.