DistInit: Learning Video Representations without a Single Labeled Video

01/26/2019
by   Rohit Girdhar, et al.
0

Video recognition models have progressed significantly over the past few years, evolving from shallow classifiers trained on hand-crafted features to deep spatiotemporal networks. However, labeled video data required to train such models has not been able to keep up with the ever increasing depth and sophistication of these networks. In this work we propose an alternative approach to learning video representations that requires no semantically labeled videos, and instead leverages the years of effort in collecting and labeling large and clean still-image datasets. We do so by using state-of-the-art models pre-trained on image datasets as "teachers" to train video models in a distillation framework. We demonstrate that our method learns truly spatiotemporal features, despite being trained only using supervision from still-image networks. Moreover, it learns good representations across different input modalities, using completely uncurated raw video data sources and with different 2D teacher models. Our method obtains strong transfer performance, outperforming standard techniques for bootstrapping video architectures from image-based models and obtains competitive performance with state-of-the-art approaches for video action recognition.

READ FULL TEXT

page 1

page 3

page 6

research
08/25/2016

Sympathy for the Details: Dense Trajectories and Hybrid Classification Architectures for Action Recognition

Action recognition in videos is a challenging task due to the complexity...
research
07/12/2019

AVD: Adversarial Video Distillation

In this paper, we present a simple yet efficient approach for video repr...
research
11/28/2018

Self-supervised Spatiotemporal Feature Learning by Video Geometric Transformations

To alleviate the expensive cost of data collection and annotation, many ...
research
08/03/2017

Attention Transfer from Web Images for Video Recognition

Training deep learning based video classifiers for action recognition re...
research
09/15/2020

Collaborative Distillation in the Parameter and Spectrum Domains for Video Action Recognition

Recent years have witnessed the significant progress of action recogniti...
research
08/20/2020

Accuracy and Performance Comparison of Video Action Recognition Approaches

Over the past few years, there has been significant interest in video ac...
research
12/27/2022

From Single-Visit to Multi-Visit Image-Based Models: Single-Visit Models are Enough to Predict Obstructive Hydronephrosis

Previous work has shown the potential of deep learning to predict renal ...

Please sign up or login with your details

Forgot password? Click here to reset