Extending the WILDS Benchmark for Unsupervised Adaptation

12/09/2021
by   Shiori Sagawa, et al.
17

Machine learning systems deployed in the wild are often trained on a source distribution but deployed on a different target distribution. Unlabeled data can be a powerful point of leverage for mitigating these distribution shifts, as it is frequently much more available than labeled data. However, existing distribution shift benchmarks for unlabeled data do not reflect the breadth of scenarios that arise in real-world applications. In this work, we present the WILDS 2.0 update, which extends 8 of the 10 datasets in the WILDS benchmark of distribution shifts to include curated unlabeled data that would be realistically obtainable in deployment. To maintain consistency, the labeled training, validation, and test sets, as well as the evaluation metrics, are exactly the same as in the original WILDS benchmark. These datasets span a wide range of applications (from histology to wildlife conservation), tasks (classification, regression, and detection), and modalities (photos, satellite images, microscope slides, text, molecular graphs). We systematically benchmark state-of-the-art methods that leverage unlabeled data, including domain-invariant, self-training, and self-supervised methods, and show that their success on WILDS 2.0 is limited. To facilitate method development and evaluation, we provide an open-source package that automates data loading and contains all of the model architectures and methods used in this paper. Code and leaderboards are available at https://wilds.stanford.edu.

READ FULL TEXT

page 3

page 4

research
11/25/2022

Wild-Time: A Benchmark of in-the-Wild Distribution Shift over Time

Distribution shift occurs when the test distribution differs from the tr...
research
12/14/2020

WILDS: A Benchmark of in-the-Wild Distribution Shifts

Distribution shifts can cause significant degradation in a broad range o...
research
12/21/2020

Out-distribution aware Self-training in an Open World Setting

Deep Learning heavily depends on large labeled datasets which limits fur...
research
08/17/2022

Exploiting Unlabeled Data for Target-Oriented Opinion Words Extraction

Target-oriented Opinion Words Extraction (TOWE) is a fine-grained sentim...
research
10/13/2022

Exploiting Mixed Unlabeled Data for Detecting Samples of Seen and Unseen Out-of-Distribution Classes

Out-of-Distribution (OOD) detection is essential in real-world applicati...
research
01/11/2022

Leveraging Unlabeled Data to Predict Out-of-Distribution Performance

Real-world machine learning deployments are characterized by mismatches ...
research
07/22/2022

Discrete Key-Value Bottleneck

Deep neural networks perform well on prediction and classification tasks...

Please sign up or login with your details

Forgot password? Click here to reset