DeepAI AI Chat
Log In Sign Up

Distributionally Robust Data Join

by   Pranjal Awasthi, et al.

Suppose we are given two datasets: a labeled dataset and unlabeled dataset which also has additional auxiliary features not present in the first dataset. What is the most principled way to use these datasets together to construct a predictor? The answer should depend upon whether these datasets are generated by the same or different distributions over their mutual feature sets, and how similar the test distribution will be to either of those distributions. In many applications, the two datasets will likely follow different distributions, but both may be close to the test distribution. We introduce the problem of building a predictor which minimizes the maximum loss over all probability distributions over the original features, auxiliary features, and binary labels, whose Wasserstein distance is r_1 away from the empirical distribution over the labeled dataset and r_2 away from that of the unlabeled dataset. This can be thought of as a generalization of distributionally robust optimization (DRO), which allows for two data sources, one of which is unlabeled and may contain auxiliary features.


page 1

page 2

page 3

page 4


Domain Generalization by Marginal Transfer Learning

Domain generalization is the problem of assigning class labels to an unl...

Leveraging Unlabeled Data to Predict Out-of-Distribution Performance

Real-world machine learning deployments are characterized by mismatches ...

Beyond without Forgetting: Multi-Task Learning for Classification with Disjoint Datasets

Multi-task Learning (MTL) for classification with disjoint datasets aims...

Robust hypothesis testing and distribution estimation in Hellinger distance

We propose a simple robust hypothesis test that has the same sample comp...

Causally-motivated Shortcut Removal Using Auxiliary Labels

Robustness to certain distribution shifts is a key requirement in many M...

Diversify and Disambiguate: Learning From Underspecified Data

Many datasets are underspecified, which means there are several equally ...

Out-of-Distribution Generalization with Maximal Invariant Predictor

Out-of-Distribution (OOD) generalization problem is a problem of seeking...