DeepAI AI Chat
Log In Sign Up

Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign Dropout

10/14/2020
by   Zhao Chen, et al.
0

The vast majority of deep models use multiple gradient signals, typically corresponding to a sum of multiple loss terms, to update a shared set of trainable weights. However, these multiple updates can impede optimal training by pulling the model in conflicting directions. We present Gradient Sign Dropout (GradDrop), a probabilistic masking procedure which samples gradients at an activation layer based on their level of consistency. GradDrop is implemented as a simple deep layer that can be used in any deep net and synergizes with other gradient balancing approaches. We show that GradDrop outperforms the state-of-the-art multiloss methods within traditional multitask and transfer learning settings, and we discuss how GradDrop reveals links between optimal multiloss training and gradient stochasticity.

READ FULL TEXT

page 1

page 2

page 3

page 4

05/20/2020

Multitask Learning with Single Gradient Step Update for Task Balancing

Multitask learning is a methodology to boost generalization performance ...
11/07/2017

GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks

Deep multitask networks, in which one neural network produces multiple p...
06/19/2017

Deep Counterfactual Networks with Propensity-Dropout

We propose a novel approach for inferring the individualized causal effe...
06/25/2020

MTAdam: Automatic Balancing of Multiple Training Loss Terms

When training neural models, it is common to combine multiple loss terms...
11/18/2020

Master Thesis: Neural Sign Language Translation by Learning Tokenization

In this thesis, we propose a multitask learning based method to improve ...
02/02/2020

Neural Sign Language Translation by Learning Tokenization

Sign Language Translation has attained considerable success recently, ra...
08/15/2019

Multitask and Transfer Learning for Autotuning Exascale Applications

Multitask learning and transfer learning have proven to be useful in the...