Spatial Ensemble: a Novel Model Smoothing Mechanism for Student-Teacher Framework

10/04/2021
by   Tengteng Huang, et al.
0

Model smoothing is of central importance for obtaining a reliable teacher model in the student-teacher framework, where the teacher generates surrogate supervision signals to train the student. A popular model smoothing method is the Temporal Moving Average (TMA), which continuously averages the teacher parameters with the up-to-date student parameters. In this paper, we propose "Spatial Ensemble", a novel model smoothing mechanism in parallel with TMA. Spatial Ensemble randomly picks up a small fragment of the student model to directly replace the corresponding fragment of the teacher model. Consequentially, it stitches different fragments of historical student models into a unity, yielding the "Spatial Ensemble" effect. Spatial Ensemble obtains comparable student-teacher learning performance by itself and demonstrates valuable complementarity with temporal moving average. Their integration, named Spatial-Temporal Smoothing, brings general (sometimes significant) improvement to the student-teacher learning framework on a variety of state-of-the-art methods. For example, based on the self-supervised method BYOL, it yields +0.9 top-1 accuracy improvement on ImageNet, while based on the semi-supervised approach FixMatch, it increases the top-1 accuracy by around +6 when only few training labels are available. Codes and models are available at: https://github.com/tengteng95/Spatial_Ensemble.

READ FULL TEXT
research
06/14/2021

Kaizen: Continuously improving teacher using Exponential Moving Average for semi-supervised speech recognition

In this paper, we introduce the Kaizen framework that uses a continuousl...
research
07/19/2020

Self-similarity Student for Partial Label Histopathology Image Segmentation

Delineation of cancerous regions in gigapixel whole slide images (WSIs) ...
research
07/13/2020

Temporal Self-Ensembling Teacher for Semi-Supervised Object Detection

This paper focuses on the problem of Semi-Supervised Object Detection (S...
research
07/28/2023

Multiple Instance Learning Framework with Masked Hard Instance Mining for Whole Slide Image Classification

The whole slide image (WSI) classification is often formulated as a mult...
research
06/07/2022

DiMS: Distilling Multiple Steps of Iterative Non-Autoregressive Transformers

The computational benefits of iterative non-autoregressive transformers ...
research
04/01/2018

Substitute Teacher Networks: Learning with Almost No Supervision

Learning through experience is time-consuming, inefficient and often bad...
research
07/05/2022

ST-CoNAL: Consistency-Based Acquisition Criterion Using Temporal Self-Ensemble for Active Learning

Modern deep learning has achieved great success in various fields. Howev...

Please sign up or login with your details

Forgot password? Click here to reset