SHIFT15M: Multiobjective Large-Scale Fashion Dataset with Distributional Shifts

08/30/2021
by   Masanari Kimura, et al.
0

Many machine learning algorithms assume that the training data and the test data follow the same distribution. However, such assumptions are often violated in real-world machine learning problems. In this paper, we propose SHIFT15M, a dataset that can be used to properly evaluate models in situations where the distribution of data changes between training and testing. The SHIFT15M dataset has several good properties: (i) Multiobjective. Each instance in the dataset has several numerical values that can be used as target variables. (ii) Large-scale. The SHIFT15M dataset consists of 15million fashion images. (iii) Coverage of types of dataset shifts. SHIFT15M contains multiple dataset shift problem settings (e.g., covariate shift or target shift). SHIFT15M also enables the performance evaluation of the model under various magnitudes of dataset shifts by switching the magnitude. In addition, we provide software to handle SHIFT15M in a very simple way: https://github.com/st-tech/zozo-shift15m.

READ FULL TEXT

page 3

page 8

research
08/29/2023

Biquality Learning: a Framework to Design Algorithms Dealing with Closed-Set Distribution Shifts

Training machine learning models from data with weak supervision and dat...
research
06/28/2021

Ensembling Shift Detectors: an Extensive Empirical Evaluation

The term dataset shift refers to the situation where the data used to tr...
research
08/11/2020

BREEDS: Benchmarks for Subpopulation Shift

We develop a methodology for assessing the robustness of models to subpo...
research
07/21/2021

Preventing dataset shift from breaking machine-learning biomarkers

Machine learning brings the hope of finding new biomarkers extracted fro...
research
06/05/2023

Inference under constrained distribution shifts

Large-scale administrative or observational datasets are increasingly us...
research
10/04/2022

Data drift correction via time-varying importance weight estimator

Real-world deployment of machine learning models is challenging when dat...
research
05/17/2022

A unified framework for dataset shift diagnostics

Most machine learning (ML) methods assume that the data used in the trai...

Please sign up or login with your details

Forgot password? Click here to reset