Knowledge Prompting for Few-shot Action Recognition

11/22/2022
by   Yuheng Shi, et al.
0

Few-shot action recognition in videos is challenging for its lack of supervision and difficulty in generalizing to unseen actions. To address this task, we propose a simple yet effective method, called knowledge prompting, which leverages commonsense knowledge of actions from external resources to prompt a powerful pre-trained vision-language model for few-shot classification. We first collect large-scale language descriptions of actions, defined as text proposals, to build an action knowledge base. The collection of text proposals is done by filling in handcraft sentence templates with external action-related corpus or by extracting action-related phrases from captions of Web instruction videos.Then we feed these text proposals into the pre-trained vision-language model along with video frames to generate matching scores of the proposals to each frame, and the scores can be treated as action semantics with strong generalization. Finally, we design a lightweight temporal modeling network to capture the temporal evolution of action semantics for classification.Extensive experiments on six benchmark datasets demonstrate that our method generally achieves the state-of-the-art performance while reducing the training overhead to 0.001 of existing methods.

READ FULL TEXT

page 3

page 7

research
10/26/2021

Zero-Shot Action Recognition from Diverse Object-Scene Compositions

This paper investigates the problem of zero-shot action recognition, in ...
research
08/03/2023

Multimodal Adaptation of CLIP for Few-Shot Action Recognition

Applying large-scale pre-trained visual models like CLIP to few-shot act...
research
12/08/2021

Prompting Visual-Language Models for Efficient Video Understanding

Visual-language pre-training has shown great success for learning joint ...
research
10/31/2021

Distantly Supervised Semantic Text Detection and Recognition for Broadcast Sports Videos Understanding

Comprehensive understanding of key players and actions in multiplayer sp...
research
08/05/2021

Elaborative Rehearsal for Zero-shot Action Recognition

The growing number of action classes has posed a new challenge for video...
research
06/17/2019

A Temporal Sequence Learning for Action Recognition and Prediction

In this work[This work was supported in part by the National Science Fou...
research
07/04/2022

Disentangled Action Recognition with Knowledge Bases

Action in video usually involves the interaction of human with objects. ...

Please sign up or login with your details

Forgot password? Click here to reset