Rich Feature Construction for the Optimization-Generalization Dilemma

by   Jianyu Zhang, et al.

There often is a dilemma between ease of optimization and robust out-of-distribution (OoD) generalization. For instance, many OoD methods rely on penalty terms whose optimization is challenging. They are either too strong to optimize reliably or too weak to achieve their goals. In order to escape this dilemma, we propose to first construct a rich representation (RFC) containing a palette of potentially useful features, ready to be used by even simple models. On the one hand, a rich representation provides a good initialization for the optimizer. On the other hand, it also provides an inductive bias that helps OoD generalization. RFC is constructed in a succession of training episodes. During each step of the discovery phase, we craft a multi-objective optimization criterion and its associated datasets in a manner that prevents the network from using the features constructed in the previous iterations. During the synthesis phase, we use knowledge distillation to force the network to simultaneously develop all the features identified during the discovery phase. RFC consistently helps six OoD methods achieve top performance on challenging invariant training benchmarks, ColoredMNIST (Arjovsky et al., 2020). Furthermore, on the realistic Camelyon17 task, our method helps both OoD and ERM methods outperform earlier compatable results by at least 5%, reduce standard deviation by at least 4.1%, and makes hyperparameter tuning and model selection more reliable.


Automatic prior selection for meta Bayesian optimization with a case study on tuning deep neural network optimizers

The performance of deep neural networks can be highly sensitive to the c...

Class-relation Knowledge Distillation for Novel Class Discovery

We tackle the problem of novel class discovery, which aims to learn nove...

Min-Max Bilevel Multi-objective Optimization with Applications in Machine Learning

This paper is the first to propose a generic min-max bilevel multi-objec...

RDPD: Rich Data Helps Poor Data via Imitation

In many situations, we have both rich- and poor- data environments: in a...

Towards Optimization and Model Selection for Domain Generalization: A Mixup-guided Solution

The distribution shifts between training and test data typically undermi...

Detector Discovery in the Wild: Joint Multiple Instance and Representation Learning

We develop methods for detector learning which exploit joint training ov...

Please sign up or login with your details

Forgot password? Click here to reset