Balancing Average and Worst-case Accuracy in Multitask Learning

10/12/2021
by   Paul Michel, et al.
0

When training and evaluating machine learning models on a large number of tasks, it is important to not only look at average task accuracy – which may be biased by easy or redundant tasks – but also worst-case accuracy (i.e. the performance on the task with the lowest accuracy). In this work, we show how to use techniques from the distributionally robust optimization (DRO) literature to improve worst-case performance in multitask learning. We highlight several failure cases of DRO when applied off-the-shelf and present an improved method, Lookahead-DRO (L-DRO), which mitigates these issues. The core idea of L-DRO is to anticipate the interaction between tasks during training in order to choose a dynamic re-weighting of the various task losses, which will (i) lead to minimal worst-case loss and (ii) train on as many tasks as possible. After demonstrating the efficacy of L-DRO on a small controlled synthetic setting, we evaluate it on two realistic benchmarks: a multitask version of the CIFAR-100 image classification dataset and a large-scale multilingual language modeling experiment. Our empirical results show that L-DRO achieves a better trade-off between average and worst-case accuracy with little computational overhead compared to several strong baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/16/2020

Algorithms with Predictions

We introduce algorithms that use predictions from machine learning appli...
research
05/26/2021

A data-driven approach to beating SAA out-of-sample

While solutions of Distributionally Robust Optimization (DRO) problems c...
research
04/09/2022

The Two Dimensions of Worst-case Training and the Integrated Effect for Out-of-domain Generalization

Training with an emphasis on "hard-to-learn" components of the data has ...
research
07/09/2021

Training Over-parameterized Models with Non-decomposable Objectives

Many modern machine learning applications come with complex and nuanced ...
research
07/28/2020

Distributionally Robust Losses for Latent Covariate Mixtures

While modern large-scale datasets often consist of heterogeneous subpopu...
research
05/28/2023

HyperTime: Hyperparameter Optimization for Combating Temporal Distribution Shifts

In this work, we propose a hyperparameter optimization method named Hype...
research
04/27/2022

Worst-Case Dynamic Power Distribution Network Noise Prediction Using Convolutional Neural Network

Worst-case dynamic PDN noise analysis is an essential step in PDN sign-o...

Please sign up or login with your details

Forgot password? Click here to reset