Generating Human Action Videos by Coupling 3D Game Engines and Probabilistic Graphical Models

10/12/2019
by   César Roberto de Souza, et al.
26

Deep video action recognition models have been highly successful in recent years but require large quantities of manually annotated data, which are expensive and laborious to obtain. In this work, we investigate the generation of synthetic training data for video action recognition, as synthetic data have been successfully used to supervise models for a variety of other computer vision tasks. We propose an interpretable parametric generative model of human action videos that relies on procedural generation, physics models and other components of modern game engines. With this model we generate a diverse, realistic, and physically plausible dataset of human action videos, called PHAV for "Procedural Human Action Videos". PHAV contains a total of 39,982 videos, with more than 1,000 examples for each of 35 action categories. Our video generation approach is not limited to existing motion capture sequences: 14 of these 35 categories are procedurally defined synthetic actions. In addition, each video is represented with 6 different data modalities, including RGB, optical flow and pixel-level semantic labels. These modalities are generated almost simultaneously using the Multiple Render Targets feature of modern GPUs. In order to leverage PHAV, we introduce a deep multi-task (i.e. that considers action classes from multiple datasets) representation learning architecture that is able to simultaneously learn from synthetic and real video datasets, even when their action categories differ. Our experiments on the UCF-101 and HMDB-51 benchmarks suggest that combining our large set of synthetic videos with small real-world datasets can boost recognition performance. Our approach also significantly outperforms video representations produced by fine-tuning state-of-the-art unsupervised generative models of videos.

READ FULL TEXT

page 8

page 16

page 18

page 35

page 36

page 37

page 38

page 39

research
12/02/2016

Procedural Generation of Videos to Train Deep Action Recognition Networks

Deep learning for human action recognition in videos is making significa...
research
01/20/2020

The benefits of synthetic data for action categorization

In this paper, we study the value of using synthetically produced videos...
research
12/08/2022

PromptonomyViT: Multi-Task Prompt Learning Improves Video Transformers using Synthetic Scene Data

Action recognition models have achieved impressive results by incorporat...
research
12/03/2018

A Two-Stream Variational Adversarial Network for Video Generation

Video generation is an inherently challenging task, as it requires the m...
research
10/28/2020

ElderSim: A Synthetic Data Generation Platform for Human Action Recognition in Eldercare Applications

To train deep learning models for vision-based action recognition of eld...
research
01/23/2018

Let's Dance: Learning From Online Dance Videos

In recent years, deep neural network approaches have naturally extended ...
research
12/13/2019

SPIN: A High Speed, High Resolution Vision Dataset for Tracking and Action Recognition in Ping Pong

We introduce a new high resolution, high frame rate stereo video dataset...

Please sign up or login with your details

Forgot password? Click here to reset