Zero-shot Recognition of Complex Action Sequences

12/08/2019
by   Jonathan D. Jones, et al.
0

Zero-shot video classification for fine-grained activity recognition has largely been explored using methods similar to its image-based counterpart, namely by defining image-derived attributes that serve to discriminate among classes. However, such methods do not capture the fundamental dynamics of activities and are thus limited to cases where static image content alone suffices to classify an activity. For example, reversible actions such as entering and exiting a car are often indistinguishable. In this work, we present a framework for straightforward modeling of activities as a state machine of dynamic attributes. We show that encoding the temporal structure of attributes greatly increases our modeling power, allowing us to capture action direction, for example. Further, we can extend this to activity detection using dynamic programming, providing, to our knowledge, the first example of zero-shot joint segmentation and classification of complex action sequences in a larger video. We evaluate our method on the Olympic Sports dataset where our model establishes a new state of the art for standard zero-shot-learning (ZSL) evaluation as well as outperforming all other models in the inductive category for general (GZSL) zero-shot evaluation. Additionally, we are the first to demonstrate zero-shot decoding of complex action sequences on a widely used surgical dataset. Lastly, we show that that we can even eliminate the need to train attribute detectors by using off-the-shelf object detectors to recognize activities in challenging surveillance videos.

READ FULL TEXT

page 2

page 6

research
07/29/2017

Zero-Shot Activity Recognition with Verb Attribute Induction

In this paper, we investigate large-scale zero-shot activity recognition...
research
12/06/2018

Zero-Shot Anticipation for Instructional Activities

How can we teach a robot to predict what will happen next for an activit...
research
10/20/2017

Generalized Zero-Shot Learning for Action Recognition with Web-Scale Video Data

Action recognition in surveillance video makes our life safer by detecti...
research
01/22/2020

Zero-Shot Activity Recognition with Videos

In this paper, we examined the zero-shot activity recognition task with ...
research
02/01/2023

Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization

Contrastive Language-Image Pretraining (CLIP) has demonstrated impressiv...
research
08/16/2019

Zero-Shot Crowd Behavior Recognition

Understanding crowd behavior in video is challenging for computer vision...
research
05/24/2023

A Monte Carlo Language Model Pipeline for Zero-Shot Sociopolitical Event Extraction

We consider dyadic zero-shot event extraction (EE) to identify actions b...

Please sign up or login with your details

Forgot password? Click here to reset