Prompt Learning for Action Recognition

05/21/2023
by   Xijun Wang, et al.
0

We present a new general learning approach for action recognition, Prompt Learning for Action Recognition (PLAR), which leverages the strengths of prompt learning to guide the learning process. Our approach is designed to predict the action label by helping the models focus on the descriptions or instructions associated with actions in the input videos. Our formulation uses various prompts, including optical flow, large vision models, and learnable prompts to improve the recognition performance. Moreover, we propose a learnable prompt method that learns to dynamically generate prompts from a pool of prompt experts under different inputs. By sharing the same objective, our proposed PLAR can optimize prompts that guide the model's predictions while explicitly learning input-invariant (prompt experts pool) and input-specific (data-dependent) prompt knowledge. We evaluate our approach on datasets consisting of both ground camera videos and aerial videos, and scenes with single-agent and multi-agent actions. In practice, we observe a 3.17-10.2 accuracy improvement on the aerial multi-agent dataset, Okutamam and 0.8-2.6 improvement on the ground camera single-agent dataset, Something Something V2. We plan to release our code on the WWW.

READ FULL TEXT

page 2

page 9

research
12/22/2017

On the Integration of Optical Flow and Action Recognition

Most of the top performing action recognition methods use optical flow a...
research
10/22/2019

Human Action Recognition in Drone Videos using a Few Aerial Training Examples

Drones are enabling new forms of human actions surveillance due to their...
research
04/19/2022

ActAR: Actor-Driven Pose Embeddings for Video Action Recognition

Human action recognition (HAR) in videos is one of the core tasks of vid...
research
11/01/2019

Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video Understanding

An event happening in the world is often made of different activities an...
research
05/22/2019

What Would You Expect? Anticipating Egocentric Actions with Rolling-Unrolling LSTMs and Modality Attention

Egocentric action anticipation consists in understanding which objects t...
research
03/05/2023

MITFAS: Mutual Information based Temporal Feature Alignment and Sampling for Aerial Video Action Recognition

We present a novel approach for action recognition in UAV videos. Our fo...
research
09/18/2019

Multiple Human Tracking using Multi-Cues including Primitive Action Features

In this paper, we propose a Multiple Human Tracking method using multi-c...

Please sign up or login with your details

Forgot password? Click here to reset