Video Action Understanding: A Tutorial

by   Matthew Hutchinson, et al.

Many believe that the successes of deep learning on image understanding problems can be replicated in the realm of video understanding. However, the span of video action problems and the set of proposed deep learning solutions is arguably wider and more diverse than those of their 2D image siblings. Finding, identifying, and predicting actions are a few of the most salient tasks in video action understanding. This tutorial clarifies a taxonomy of video action problems, highlights datasets and metrics used to baseline each problem, describes common data preparation methods, and presents the building blocks of state-of-the-art deep learning model architectures.


page 2

page 4

page 14

page 15

page 16


Automatic Understanding of Image and Video Advertisements

There is more to images than their objective physical content: for examp...

Improved Soccer Action Spotting using both Audio and Video Streams

In this paper, we propose a study on multi-modal (audio and video) actio...

The Kinetics Human Action Video Dataset

We describe the DeepMind Kinetics human action video dataset. The datase...

Gradient Frequency Modulation for Visually Explaining Video Understanding Models

In many applications, it is essential to understand why a machine learni...

ECO: Efficient Convolutional Network for Online Video Understanding

The state of the art in video understanding suffers from two problems: (...

Long Activity Video Understanding using Functional Object-Oriented Network

Video understanding is one of the most challenging topics in computer vi...

Tiny Video Networks

Video understanding is a challenging problem with great impact on the ab...