MOD: A Deep Mixture Model with Online Knowledge Distillation for Large Scale Video Temporal Concept Localization

10/27/2019
by   Rongcheng Lin, et al.
0

In this paper, we present and discuss a deep mixture model with online knowledge distillation (MOD) for large-scale video temporal concept localization, which is ranked 3rd in the 3rd YouTube-8M Video Understanding Challenge. Specifically, we find that by enabling knowledge sharing with online distillation, fintuning a mixture model on a smaller dataset can achieve better evaluation performance. Based on this observation, in our final solution, we trained and fintuned 12 NeXtVLAD models in parallel with a 2-layer online distillation structure. The experimental results show that the proposed distillation structure can effectively avoid overfitting and shows superior generalization performance. The code is publicly available at: https://github.com/linrongc/solution_youtube8m_v3

READ FULL TEXT
research
11/08/2022

Understanding the Role of Mixup in Knowledge Distillation: An Empirical Study

Mixup is a popular data augmentation technique based on creating new sam...
research
06/11/2023

GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model

Currently, the reduction in the parameter scale of large-scale pre-train...
research
06/11/2020

JIT-Masker: Efficient Online Distillation for Background Matting

We design a real-time portrait matting pipeline for everyday use, partic...
research
11/11/2020

Distill2Vec: Dynamic Graph Representation Learning with Knowledge Distillation

Dynamic graph representation learning strategies are based on different ...
research
11/25/2020

torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation

While knowledge distillation (transfer) has been attracting attentions f...
research
09/12/2018

Label Denoising with Large Ensembles of Heterogeneous Neural Networks

Despite recent advances in computer vision based on various convolutiona...
research
05/25/2023

VanillaKD: Revisit the Power of Vanilla Knowledge Distillation from Small Scale to Large Scale

The tremendous success of large models trained on extensive datasets dem...

Please sign up or login with your details

Forgot password? Click here to reset