Stochastic Re-weighted Gradient Descent via Distributionally Robust Optimization

06/15/2023

∙

We develop a re-weighted gradient descent technique for boosting the performance of deep neural networks. Our algorithm involves the importance weighting of data points during each optimization step. Our approach is inspired by distributionally robust optimization with f-divergences, which has been known to result in models with improved generalization guarantees. Our re-weighting scheme is simple, computationally efficient, and can be combined with any popular optimization algorithms such as SGD and Adam. Empirically, we demonstrate our approach's superiority on various tasks, including vanilla classification, classification with label imbalance, noisy labels, domain adaptation, and tabular representation learning. Notably, we obtain improvements of +0.7 respectively. Moreover, our algorithm boosts the performance of BERT on GLUE benchmarks by +1.94 demonstrate the effectiveness of the proposed approach, indicating its potential for improving performance in diverse domains.

READ FULL TEXT

Stochastic Re-weighted Gradient Descent via Distributionally Robust Optimization

Sign in with Google

Consider DeepAI Pro