Open-Vocabulary Temporal Action Detection with Off-the-Shelf Image-Text Features

12/20/2022
by   Vivek Rathod, et al.
0

Detecting actions in untrimmed videos should not be limited to a small, closed set of classes. We present a simple, yet effective strategy for open-vocabulary temporal action detection utilizing pretrained image-text co-embeddings. Despite being trained on static images rather than videos, we show that image-text co-embeddings enable openvocabulary performance competitive with fully-supervised models. We show that the performance can be further improved by ensembling the image-text features with features encoding local motion, like optical flow based features, or other modalities, like audio. In addition, we propose a more reasonable open-vocabulary evaluation setting for the ActivityNet data set, where the category splits are based on similarity rather than random assignment.

READ FULL TEXT

page 1

page 3

page 16

page 17

research
08/22/2023

Opening the Vocabulary of Egocentric Actions

Human actions in egocentric videos are often hand-object interactions co...
research
03/21/2023

Multi-modal Prompting for Low-Shot Temporal Action Localization

In this paper, we consider the problem of temporal action localization u...
research
09/08/2023

Leveraging Pretrained Image-text Models for Improving Audio-Visual Learning

Visually grounded speech systems learn from paired images and their spok...
research
06/04/2022

Rethinking the Openness of CLIP

Contrastive Language-Image Pre-training (CLIP) has demonstrated great po...
research
02/10/2022

OWL (Observe, Watch, Listen): Localizing Actions in Egocentric Video via Audiovisual Temporal Context

Temporal action localization (TAL) is an important task extensively expl...
research
05/08/2020

On Vocabulary Reliance in Scene Text Recognition

The pursuit of high performance on public benchmarks has been the drivin...
research
04/16/2021

Robust Open-Vocabulary Translation from Visual Text Representations

Machine translation models have discrete vocabularies and commonly use s...

Please sign up or login with your details

Forgot password? Click here to reset