Localizing Unseen Activities in Video via Image Query

06/28/2019
by   Zhu Zhang, et al.
1

Action localization in untrimmed videos is an important topic in the field of video understanding. However, existing action localization methods are restricted to a pre-defined set of actions and cannot localize unseen activities. Thus, we consider a new task to localize unseen activities in videos via image queries, named Image-Based Activity Localization. This task faces three inherent challenges: (1) how to eliminate the influence of semantically inessential contents in image queries; (2) how to deal with the fuzzy localization of inaccurate image queries; (3) how to determine the precise boundaries of target segments. We then propose a novel self-attention interaction localizer to retrieve unseen activities in an end-to-end fashion. Specifically, we first devise a region self-attention method with relative position encoding to learn fine-grained image region representations. Then, we employ a local transformer encoder to build multi-step fusion and reasoning of image and video contents. We next adopt an order-sensitive localizer to directly retrieve the target segment. Furthermore, we construct a new dataset ActivityIBAL by reorganizing the ActivityNet dataset. The extensive experiments show the effectiveness of our method.

READ FULL TEXT

page 1

page 3

research
11/17/2022

ReLER@ZJU Submission to the Ego4D Moment Queries Challenge 2022

In this report, we present the ReLER@ZJU1 submission to the Ego4D Moment...
research
04/22/2019

Tripping through time: Efficient Localization of Activities in Videos

Localizing moments in untrimmed videos via language queries is a new and...
research
05/05/2017

TALL: Temporal Activity Localization via Language Query

This paper focuses on temporal localization of actions in untrimmed vide...
research
05/12/2022

Entity-aware and Motion-aware Transformers for Language-driven Action Localization in Videos

Language-driven action localization in videos is a challenging task that...
research
10/15/2022

Semantic Video Moments Retrieval at Scale: A New Task and a Baseline

Motivated by the increasing need of saving search effort by obtaining re...
research
04/06/2023

Boundary-Denoising for Video Activity Localization

Video activity localization aims at understanding the semantic content i...
research
04/06/2021

Zeus: Efficiently Localizing Actions in Videos using Reinforcement Learning

Detection and localization of actions in videos is an important problem ...

Please sign up or login with your details

Forgot password? Click here to reset