Improving Baselines in the Wild

12/31/2021
by   Kazuki Irie, et al.
7

We share our experience with the recently released WILDS benchmark, a collection of ten datasets dedicated to developing models and training strategies which are robust to domain shifts. Several experiments yield a couple of critical observations which we believe are of general interest for any future work on WILDS. Our study focuses on two datasets: iWildCam and FMoW. We show that (1) Conducting separate cross-validation for each evaluation metric is crucial for both datasets, (2) A weak correlation between validation and test performance might make model development difficult for iWildCam, (3) Minor changes in the training of hyper-parameters improve the baseline by a relatively large margin (mainly on FMoW), (4) There is a strong correlation between certain domains and certain target labels (mainly on iWildCam). To the best of our knowledge, no prior work on these datasets has reported these observations despite their obvious importance. Our code is public.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/05/2019

On the discriminative power of Hyper-parameters in Cross-Validation and how to choose them

Hyper-parameters tuning is a crucial task to make a model perform at its...
research
12/14/2020

WILDS: A Benchmark of in-the-Wild Distribution Shifts

Distribution shifts can cause significant degradation in a broad range o...
research
08/05/2022

Learning to Generalize with Object-centric Agents in the Open World Survival Game Crafter

Reinforcement learning agents must generalize beyond their training expe...
research
11/21/2022

Motor Imagery Decoding Using Ensemble Curriculum Learning and Collaborative Training

Objective: In this work, we study the problem of cross-subject motor ima...
research
02/08/2023

CrossCodeBench: Benchmarking Cross-Task Generalization of Source Code Models

Despite the recent advances showing that a model pre-trained on large-sc...
research
02/20/2022

Deconstructing Distributions: A Pointwise Framework of Learning

In machine learning, we traditionally evaluate the performance of a sing...
research
09/21/2022

Benchmarking and Analyzing 3D Human Pose and Shape Estimation Beyond Algorithms

3D human pose and shape estimation (a.k.a. "human mesh recovery") has ac...

Please sign up or login with your details

Forgot password? Click here to reset