Learning Temporally Invariant and Localizable Features via Data Augmentation for Video Recognition

08/13/2020
by   Taeoh Kim, et al.
0

Deep-Learning-based video recognition has shown promising improvements along with the development of large-scale datasets and spatiotemporal network architectures. In image recognition, learning spatially invariant features is a key factor in improving recognition performance and robustness. Data augmentation based on visual inductive priors, such as cropping, flipping, rotating, or photometric jittering, is a representative approach to achieve these features. Recent state-of-the-art recognition solutions have relied on modern data augmentation strategies that exploit a mixture of augmentation operations. In this study, we extend these strategies to the temporal dimension for videos to learn temporally invariant or temporally localizable features to cover temporal perturbations or complex actions in videos. Based on our novel temporal data augmentation algorithms, video recognition performances are improved using only a limited amount of training data compared to the spatial-only data augmentation algorithms, including the 1st Visual Inductive Priors (VIPriors) for data-efficient action recognition challenge. Furthermore, learned features are temporally localizable that cannot be achieved using spatial augmentation algorithms. Our source code is available at https://github.com/taeoh-kim/temporal_data_augmentation.

READ FULL TEXT

page 3

page 6

page 7

page 12

page 13

research
11/09/2022

Extending Temporal Data Augmentation for Video Action Recognition

Pixel space augmentation has grown in popularity in many Deep Learning a...
research
03/30/2021

Learning Representational Invariances for Data-Efficient Action Recognition

Data augmentation is a ubiquitous technique for improving image classifi...
research
06/30/2022

Exploring Temporally Dynamic Data Augmentation for Video Recognition

Data augmentation has recently emerged as an essential component of mode...
research
04/20/2022

SuperpixelGridCut, SuperpixelGridMean and SuperpixelGridMix Data Augmentation

A novel approach of data augmentation based on irregular superpixel deco...
research
11/16/2022

A Unified Multimodal De- and Re-coupling Framework for RGB-D Motion Recognition

Motion recognition is a promising direction in computer vision, but the ...
research
09/30/2021

Workflow Augmentation of Video Data for Event Recognition with Time-Sensitive Neural Networks

Supervised training of neural networks requires large, diverse and well ...
research
03/30/2018

Parallel Grid Pooling for Data Augmentation

Convolutional neural network (CNN) architectures utilize downsampling la...

Please sign up or login with your details

Forgot password? Click here to reset