MixMOOD: A systematic approach to class distribution mismatch in semi-supervised learning using deep dataset dissimilarity measures

06/14/2020
by   Saul Calderon-Ramirez, et al.
1

In this work, we propose MixMOOD - a systematic approach to mitigate effect of class distribution mismatch in semi-supervised deep learning (SSDL) with MixMatch. This work is divided into two components: (i) an extensive out of distribution (OOD) ablation test bed for SSDL and (ii) a quantitative unlabelled dataset selection heuristic referred to as MixMOOD. In the first part, we analyze the sensitivity of MixMatch accuracy under 90 different distribution mismatch scenarios across three multi-class classification tasks. These are designed to systematically understand how OOD unlabelled data affects MixMatch performance. In the second part, we propose an efficient and effective method, called deep dataset dissimilarity measures (DeDiMs), to compare labelled and unlabelled datasets. The proposed DeDiMs are quick to evaluate and model agnostic. They use the feature space of a generic Wide-ResNet and can be applied prior to learning. Our test results reveal that supposed semantic similarity between labelled and unlabelled data is not a good heuristic for unlabelled data selection. In contrast, strong correlation between MixMatch accuracy and the proposed DeDiMs allow us to quantitatively rank different unlabelled datasets ante hoc according to expected MixMatch accuracy. This is what we call MixMOOD. Furthermore, we argue that the MixMOOD approach can aid to standardize the evaluation of different semi-supervised learning techniques under real world scenarios involving out of distribution data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/20/2021

More Than Meets The Eye: Semi-supervised Learning Under Non-IID Data

A common heuristic in semi-supervised deep learning (SSDL) is to select ...
research
03/01/2022

Semi-supervised Deep Learning for Image Classification with Distribution Mismatch: A Survey

Deep learning methodologies have been employed in several different fiel...
research
08/22/2020

Data Programming using Semi-Supervision and Subset Selection

The paradigm of data programming <cit.> has shown a lot of promise in us...
research
05/21/2019

Semi-Supervised Learning with Scarce Annotations

While semi-supervised learning (SSL) algorithms provide an efficient way...
research
08/23/2023

Semi-Supervised Learning via Weight-aware Distillation under Class Distribution Mismatch

Semi-Supervised Learning (SSL) under class distribution mismatch aims to...
research
06/18/2012

Semi-Supervised Learning of Class Balance under Class-Prior Change by Distribution Matching

In real-world classification problems, the class balance in the training...

Please sign up or login with your details

Forgot password? Click here to reset