AVD: Adversarial Video Distillation

07/12/2019
by   Mohammad Tavakolian, et al.
7

In this paper, we present a simple yet efficient approach for video representation, called Adversarial Video Distillation (AVD). The key idea is to represent videos by compressing them in the form of realistic images, which can be used in a variety of video-based scene analysis applications. Representing a video as a single image enables us to address the problem of video analysis by image analysis techniques. To this end, we exploit a 3D convolutional encoder-decoder network to encode the input video as an image by minimizing the reconstruction error. Furthermore, weak supervision by an adversarial training procedure is imposed on the output of the encoder to generate semantically realistic images. The encoder learns to extract semantically meaningful representations from a given input video by mapping the 3D input into a 2D latent representation. The obtained representation can be simply used as the input of deep models pre-trained on images for video classification. We evaluated the effectiveness of our proposed method for video-based activity recognition on three standard and challenging benchmark datasets, i.e. UCF101, HMDB51, and Kinetics. The experimental results demonstrate that AVD achieves interesting performance, outperforming the state-of-the-art methods for video classification.

READ FULL TEXT

page 1

page 4

research
01/26/2019

DistInit: Learning Video Representations without a Single Labeled Video

Video recognition models have progressed significantly over the past few...
research
05/12/2019

On Flow Profile Image for Video Representation

Video representation is a key challenge in many computer vision applicat...
research
05/08/2018

The Effectiveness of Instance Normalization: a Strong Baseline for Single Image Dehazing

We propose a novel deep neural network architecture for the challenging ...
research
01/28/2022

Unfolding a blurred image

We present a solution for the goal of extracting a video from a single m...
research
11/26/2018

Time-Aware and View-Aware Video Rendering for Unsupervised Representation Learning

The recent success in deep learning has lead to various effective repres...
research
12/06/2022

Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning

We present a simple approach which can turn a ViT encoder into an effici...
research
12/18/2021

Adversarial Memory Networks for Action Prediction

Action prediction aims to infer the forthcoming human action with partia...

Please sign up or login with your details

Forgot password? Click here to reset