Mixing ADAM and SGD: a Combined Optimization Method

by   Nicola Landro, et al.

Optimization methods (optimizers) get special attention for the efficient training of neural networks in the field of deep learning. In literature there are many papers that compare neural models trained with the use of different optimizers. Each paper demonstrates that for a particular problem an optimizer is better than the others but as the problem changes this type of result is no longer valid and we have to start from scratch. In our paper we propose to use the combination of two very different optimizers but when used simultaneously they can overcome the performances of the single optimizers in very different problems. We propose a new optimizer called MAS (Mixing ADAM and SGD) that integrates SGD and ADAM simultaneously by weighing the contributions of both through the assignment of constant weights. Rather than trying to improve SGD or ADAM we exploit both at the same time by taking the best of both. We have conducted several experiments on images and text document classification, using various CNNs, and we demonstrated by experiments that the proposed MAS optimizer produces better performance than the single SGD or ADAM optimizers. The source code and all the results of the experiments are available online at the following link https://gitlab.com/nicolalandro/multi_optimizer


page 1

page 5

page 6


AdaNorm: Adaptive Gradient Norm Correction based Optimizer for CNNs

The stochastic gradient descent (SGD) optimizers are generally used to t...

Dual Averaging is Surprisingly Effective for Deep Learning Optimization

First-order stochastic optimization methods are currently the most widel...

Judging Adam: Studying the Performance of Optimization Methods on ML4SE Tasks

Solving a problem with a deep learning model requires researchers to opt...

EAdam Optimizer: How ε Impact Adam

Many adaptive optimization methods have been proposed and used in deep l...

Orthogonalized SGD and Nested Architectures for Anytime Neural Networks

We propose a novel variant of SGD customized for training network archit...

Ranger21: a synergistic deep learning optimizer

As optimizers are critical to the performances of neural networks, every...

Descending through a Crowded Valley – Benchmarking Deep Learning Optimizers

Choosing the optimizer is among the most crucial decisions of deep learn...

Please sign up or login with your details

Forgot password? Click here to reset