Retro-Actions: Learning 'Close' by Time-Reversing 'Open' Videos

09/20/2019
by   Will Price, et al.
15

We investigate video transforms that result in class-homogeneous label-transforms. These are video transforms that consistently maintain or modify the labels of all videos in each class. We propose a general approach to discover invariant classes, whose transformed examples maintain their label; pairs of equivariant classes, whose transformed examples exchange their labels; and novel-generating classes, whose transformed examples belong to a new class outside the dataset. Label transforms offer additional supervision previously unexplored in video recognition benefiting data augmentation and enabling zero-shot learning opportunities by learning a class from transformed videos of its counterpart. Amongst such video transforms, we study horizontal-flipping, time-reversal, and their composition. We highlight errors in naively using horizontal-flipping as a form of data augmentation in video. Next, we validate the realism of time-reversed videos through a human perception study where people exhibit equal preference for forward and time-reversed videos. Finally, we test our approach on two datasets, Jester and Something-Something, evaluating the three video transforms for zero-shot learning and data augmentation. Our results show that gestures such as zooming in can be learnt from zooming out in a zero-shot setting, as well as more complex actions with state transitions such as digging something out of something from burying something in something.

READ FULL TEXT

page 1

page 4

page 5

page 6

page 8

research
10/20/2017

Generalized Zero-Shot Learning for Action Recognition with Web-Scale Video Data

Action recognition in surveillance video makes our life safer by detecti...
research
05/27/2020

AVGZSLNet: Audio-Visual Generalized Zero-Shot Learning by Reconstructing Label Features from Multi-Modal Embeddings

In this paper, we solve for the problem of generalized zero-shot learnin...
research
02/01/2023

Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization

Contrastive Language-Image Pretraining (CLIP) has demonstrated impressiv...
research
09/15/2017

Multi-Label Zero-Shot Human Action Recognition via Joint Latent Embedding

Human action recognition refers to automatic recognizing human actions f...
research
08/05/2021

Elaborative Rehearsal for Zero-shot Action Recognition

The growing number of action classes has posed a new challenge for video...
research
11/26/2016

Multi-Task Zero-Shot Action Recognition with Prioritised Data Augmentation

Zero-Shot Learning (ZSL) promises to scale visual recognition by bypassi...
research
01/02/2023

NaQ: Leveraging Narrations as Queries to Supervise Episodic Memory

Searching long egocentric videos with natural language queries (NLQ) has...

Please sign up or login with your details

Forgot password? Click here to reset