From Isolated Islands to Pangea: Unifying Semantic Space for Human Action Understanding

04/02/2023
by   Yong-Lu Li, et al.
0

Action understanding matters and attracts attention. It can be formed as the mapping from the action physical space to the semantic space. Typically, researchers built action datasets according to idiosyncratic choices to define classes and push the envelope of benchmarks respectively. Thus, datasets are incompatible with each other like "Isolated Islands" due to semantic gaps and various class granularities, e.g., do housework in dataset A and wash plate in dataset B. We argue that a more principled semantic space is an urgent need to concentrate the community efforts and enable us to use all datasets together to pursue generalizable action learning. To this end, we design a Poincare action semantic space given verb taxonomy hierarchy and covering massive actions. By aligning the classes of previous datasets to our semantic space, we gather (image/video/skeleton/MoCap) datasets into a unified database in a unified label system, i.e., bridging "isolated islands" into a "Pangea". Accordingly, we propose a bidirectional mapping model between physical and semantic space to fully use Pangea. In extensive experiments, our system shows significant superiority, especially in transfer learning. Code and data will be made publicly available.

READ FULL TEXT

page 8

page 24

page 25

research
04/25/2019

Holistic Large Scale Video Understanding

Action recognition has been advanced in recent years by benchmarks with ...
research
06/09/2017

Okutama-Action: An Aerial View Video Dataset for Concurrent Human Action Detection

Despite significant progress in the development of human action detectio...
research
04/08/2019

Unsupervised learning of action classes with continuous temporal embedding

The task of temporally detecting and segmenting actions in untrimmed vid...
research
10/13/2020

Video Action Understanding: A Tutorial

Many believe that the successes of deep learning on image understanding ...
research
08/15/2023

Action Class Relation Detection and Classification Across Multiple Video Datasets

The Meta Video Dataset (MetaVD) provides annotated relations between act...
research
02/27/2023

LMSeg: Language-guided Multi-dataset Segmentation

It's a meaningful and attractive topic to build a general and inclusive ...
research
07/17/2023

Glamour muscles: why having a body is not what it means to be embodied

Embodiment has recently enjoyed renewed consideration as a means to ampl...

Please sign up or login with your details

Forgot password? Click here to reset