Search-Map-Search: A Frame Selection Paradigm for Action Recognition

04/20/2023
by   Mingjun Zhao, et al.
0

Despite the success of deep learning in video understanding tasks, processing every frame in a video is computationally expensive and often unnecessary in real-time applications. Frame selection aims to extract the most informative and representative frames to help a model better understand video content. Existing frame selection methods either individually sample frames based on per-frame importance prediction, without considering interaction among frames, or adopt reinforcement learning agents to find representative frames in succession, which are costly to train and may lead to potential stability issues. To overcome the limitations of existing methods, we propose a Search-Map-Search learning paradigm which combines the advantages of heuristic search and supervised learning to select the best combination of frames from a video as one entity. By combining search with learning, the proposed method can better capture frame interactions while incurring a low inference overhead. Specifically, we first propose a hierarchical search method conducted on each training video to search for the optimal combination of frames with the lowest error on the downstream task. A feature mapping function is then learned to map the frames of a video to the representation of its target optimal frame combination. During inference, another search is performed on an unseen video to select a combination of frames whose feature representation is close to the projected feature representation. Extensive experiments based on several action recognition benchmarks demonstrate that our frame selection method effectively improves performance of action recognition models, and significantly outperforms a number of competitive baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/19/2020

SMART Frame Selection for Action Recognition

Action recognition is computationally expensive. In this paper, we addre...
research
03/29/2021

No frame left behind: Full Video Action Recognition

Not all video frames are equally informative for recognizing an action. ...
research
09/11/2018

Temporal-Spatial Mapping for Action Recognition

Deep learning models have enjoyed great success for image related comput...
research
09/30/2022

Alignment-guided Temporal Attention for Video Action Recognition

Temporal modeling is crucial for various video learning tasks. Most rece...
research
11/23/2021

Self-Regulated Learning for Egocentric Video Activity Anticipation

Future activity anticipation is a challenging problem in egocentric visi...
research
03/15/2023

VVS: Video-to-Video Retrieval with Irrelevant Frame Suppression

In content-based video retrieval (CBVR), dealing with large-scale collec...
research
06/13/2023

360TripleView: 360-Degree Video View Management System Driven by Convergence Value of Viewing Preferences

360-degree video has become increasingly popular in content consumption....

Please sign up or login with your details

Forgot password? Click here to reset