DynamoNet: Dynamic Action and Motion Network

04/25/2019
by   Ali Diba, et al.
10

In this paper, we are interested in self-supervised learning the motion cues in videos using dynamic motion filters for a better motion representation to finally boost human action recognition in particular. Thus far, the vision community has focused on spatio-temporal approaches using standard filters, rather we here propose dynamic filters that adaptively learn the video-specific internal motion representation by predicting the short-term future frames. We name this new motion representation, as dynamic motion representation (DMR) and is embedded inside of 3D convolutional network as a new layer, which captures the visual appearance and motion dynamics throughout entire video clip via end-to-end network learning. Simultaneously, we utilize these motion representation to enrich video classification. We have designed the frame prediction task as an auxiliary task to empower the classification problem. With these overall objectives, to this end, we introduce a novel unified spatio-temporal 3D-CNN architecture (DynamoNet) that jointly optimizes the video classification and learning motion representation by predicting future frames as a multi-task learning problem. We conduct experiments on challenging human action datasets: Kinetics 400, UCF101, HMDB51. The experiments using the proposed DynamoNet show promising results on all the datasets.

READ FULL TEXT

page 4

page 7

research
06/17/2019

Spatio-Temporal Fusion Networks for Action Recognition

The video based CNN works have focused on effective ways to fuse appeara...
research
02/10/2015

Video Primal Sketch: A Unified Middle-Level Representation for Video

This paper presents a middle-level video representation named Video Prim...
research
11/21/2016

Deep Temporal Linear Encoding Networks

The CNN-encoding of features from entire videos for the representation o...
research
12/03/2018

SUSiNet: See, Understand and Summarize it

In this work we propose a multi-task spatio-temporal network, called SUS...
research
05/24/2016

Spatio-Temporal Image Boundary Extrapolation

Boundary prediction in images as well as video has been a very active to...
research
08/15/2019

DeepHuMS: Deep Human Motion Signature for 3D Skeletal Sequences

3D Human Motion Indexing and Retrieval is an interesting problem due to ...
research
05/04/2017

Am I Done? Predicting Action Progress in Videos

In this paper we introduce the problem of predicting action progress in ...

Please sign up or login with your details

Forgot password? Click here to reset