GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks

11/07/2017
by   Zhao Chen, et al.
0

Deep multitask networks, in which one neural network produces multiple predictive outputs, are more scalable and often better regularized than their single-task counterparts. Such advantages can potentially lead to gains in both speed and performance, but multitask networks are also difficult to train without finding the right balance between tasks. We present a novel gradient normalization (GradNorm) technique which automatically balances the multitask loss function by directly tuning the gradients to equalize task training rates. We show that for various network architectures, for both regression and classification tasks, and on both synthetic and real datasets, GradNorm improves accuracy and reduces overfitting over single networks, static baselines, and other adaptive multitask loss balancing techniques. GradNorm also matches or surpasses the performance of exhaustive grid search methods, despite only involving a single asymmetry hyperparameter α. Thus, what was once a tedious search process which incurred exponentially more compute for each task added can now be accomplished within a few training runs, irrespective of the number of tasks. Ultimately, we hope to demonstrate that direct gradient manipulation affords us great control over the training dynamics of multitask networks and may be one of the keys to unlocking the potential of multitask learning.

READ FULL TEXT

page 2

page 8

research
05/20/2020

Multitask Learning with Single Gradient Step Update for Task Balancing

Multitask learning is a methodology to boost generalization performance ...
research
10/30/2019

Generalization in multitask deep neural classifiers: a statistical physics approach

A proper understanding of the striking generalization abilities of deep ...
research
06/06/2023

FAMO: Fast Adaptive Multitask Optimization

One of the grand enduring goals of AI is to create generalist agents tha...
research
10/14/2020

Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign Dropout

The vast majority of deep models use multiple gradient signals, typicall...
research
06/22/2022

Dynamic Restrained Uncertainty Weighting Loss for Multitask Learning of Vocal Expression

We propose a novel Dynamic Restrained Uncertainty Weighting Loss to expe...
research
09/02/2018

Multitask Learning for Fundamental Frequency Estimation in Music

Fundamental frequency (f0) estimation from polyphonic music includes the...
research
04/05/2016

Deep Cross Residual Learning for Multitask Visual Recognition

Residual learning has recently surfaced as an effective means of constru...

Please sign up or login with your details

Forgot password? Click here to reset