Zero-Shot Action Recognition from Diverse Object-Scene Compositions

10/26/2021
by   Carlo Bretti, et al.
8

This paper investigates the problem of zero-shot action recognition, in the setting where no training videos with seen actions are available. For this challenging scenario, the current leading approach is to transfer knowledge from the image domain by recognizing objects in videos using pre-trained networks, followed by a semantic matching between objects and actions. Where objects provide a local view on the content in videos, in this work we also seek to include a global view of the scene in which actions occur. We find that scenes on their own are also capable of recognizing unseen actions, albeit more marginally than objects, and a direct combination of object-based and scene-based scores degrades the action recognition performance. To get the best out of objects and scenes, we propose to construct them as a Cartesian product of all possible compositions. We outline how to determine the likelihood of object-scene compositions in videos, as well as a semantic matching from object-scene compositions to actions that enforces diversity among the most relevant compositions for each action. While simple, our composition-based approach outperforms object-based approaches and even state-of-the-art zero-shot approaches that rely on large-scale video datasets with hundreds of seen actions for training and knowledge transfer.

READ FULL TEXT

page 2

page 6

page 7

page 9

research
03/08/2022

Universal Prototype Transport for Zero-Shot Action Recognition and Localization

This work addresses the problem of recognizing action categories in vide...
research
08/28/2020

All About Knowledge Graphs for Actions

Current action recognition systems require large amounts of training dat...
research
09/02/2020

Zero-Shot Human-Object Interaction Recognition via Affordance Graphs

We propose a new approach for Zero-Shot Human-Object Interaction Recogni...
research
06/17/2022

Learning Using Privileged Information for Zero-Shot Action Recognition

Zero-Shot Action Recognition (ZSAR) aims to recognize video actions that...
research
04/27/2016

Zero-shot object prediction using semantic scene knowledge

This work focuses on the semantic relations between scenes and objects f...
research
12/05/2019

Zero-Shot Generation of Human-Object Interaction Videos

Generation of videos of complex scenes is an important open problem in c...
research
11/22/2022

Knowledge Prompting for Few-shot Action Recognition

Few-shot action recognition in videos is challenging for its lack of sup...

Please sign up or login with your details

Forgot password? Click here to reset