MTAdam: Automatic Balancing of Multiple Training Loss Terms

06/25/2020
by   Itzik Malkiel, et al.
0

When training neural models, it is common to combine multiple loss terms. The balancing of these terms requires considerable human effort and is computationally demanding. Moreover, the optimal trade-off between the loss term can change as training progresses, especially for adversarial terms. In this work, we generalize the Adam optimization algorithm to handle multiple loss terms. The guiding principle is that for every layer, the gradient magnitude of the terms should be balanced. To this end, the Multi-Term Adam (MTAdam) computes the derivative of each loss term separately, infers the first and second moments per parameter and loss term, and calculates a first moment for the magnitude per layer of the gradients arising from each loss. This magnitude is used to continuously balance the gradients across all layers, in a manner that both varies from one layer to the next and dynamically changes over time. Our results show that training with the new method leads to fast recovery from suboptimal initial loss weighting and to training outcomes that match conventional training with the prescribed hyperparameters of each method.

READ FULL TEXT

page 5

page 8

page 12

page 13

research
10/19/2021

Multi-Objective Loss Balancing for Physics-Informed Deep Learning

Physics Informed Neural Networks (PINN) are algorithms from deep learnin...
research
08/23/2023

A Scale-Invariant Task Balancing Approach for Multi-Task Learning

Multi-task learning (MTL), a learning paradigm to learn multiple related...
research
10/14/2020

Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign Dropout

The vast majority of deep models use multiple gradient signals, typicall...
research
11/04/2019

Persistency of Excitation for Robustness of Neural Networks

When an online learning algorithm is used to estimate the unknown parame...
research
06/24/2020

Imbalanced Gradients: A New Cause of Overestimated Adversarial Robustness

Evaluating the robustness of a defense model is a challenging task in ad...
research
06/07/2021

Correcting Momentum in Temporal Difference Learning

A common optimization tool used in deep reinforcement learning is moment...
research
04/25/2023

Stable and low-precision training for large-scale vision-language models

We introduce new methods for 1) accelerating and 2) stabilizing training...

Please sign up or login with your details

Forgot password? Click here to reset