Disentangled Action Recognition with Knowledge Bases

07/04/2022
by   Zhekun Luo, et al.
10

Action in video usually involves the interaction of human with objects. Action labels are typically composed of various combinations of verbs and nouns, but we may not have training data for all possible combinations. In this paper, we aim to improve the generalization ability of the compositional action recognition model to novel verbs or novel nouns that are unseen during training time, by leveraging the power of knowledge graphs. Previous work utilizes verb-noun compositional action nodes in the knowledge graph, making it inefficient to scale since the number of compositional action nodes grows quadratically with respect to the number of verbs and nouns. To address this issue, we propose our approach: Disentangled Action Recognition with Knowledge-bases (DARK), which leverages the inherent compositionality of actions. DARK trains a factorized model by first extracting disentangled feature representations for verbs and nouns, and then predicting classification weights using relations in external knowledge graphs. The type constraint between verb and noun is extracted from external knowledge bases and finally applied when composing actions. DARK has better scalability in the number of objects and verbs, and achieves state-of-the-art performance on the Charades dataset. We further propose a new benchmark split based on the Epic-kitchen dataset which is an order of magnitude bigger in the numbers of classes and samples, and benchmark various models on this benchmark.

READ FULL TEXT

page 4

page 9

research
12/20/2019

Something-Else: Compositional Action Recognition with Spatial-Temporal Interaction Networks

Human action is naturally compositional: humans can easily recognize and...
research
12/03/2020

SAFCAR: Structured Attention Fusion for Compositional Action Recognition

We present a general framework for compositional action recognition – i....
research
07/13/2023

Free-Form Composition Networks for Egocentric Action Recognition

Egocentric action recognition is gaining significant attention in the fi...
research
08/28/2020

All About Knowledge Graphs for Actions

Current action recognition systems require large amounts of training dat...
research
09/02/2020

Zero-Shot Human-Object Interaction Recognition via Affordance Graphs

We propose a new approach for Zero-Shot Human-Object Interaction Recogni...
research
11/22/2022

Knowledge Prompting for Few-shot Action Recognition

Few-shot action recognition in videos is challenging for its lack of sup...

Please sign up or login with your details

Forgot password? Click here to reset