Learning Latent Super-Events to Detect Multiple Activities in Videos

12/05/2017
by   AJ Piergiovanni, et al.
0

In this paper, we introduce the concept of learning latent super-events from activity videos, and present how it benefits activity detection in continuous videos. We define a super-event as a set of multiple events occurring together in videos with a particular temporal organization; it is the opposite concept of sub-events. Real-world videos contain multiple activities and are rarely segmented (e.g., surveillance videos), and learning latent super-events allows the model to capture how the events are temporally related in videos. We design temporal structure filters that enables the model to focus on particular sub-intervals of the videos, and use them together with a soft attention mechanism to learn representations of latent super-events. Super-event representations are combined with per-frame or per-segment CNNs to provide frame-level annotations. Our approach is designed to be fully differentiable, enabling an end-to-end learning of latent super-event representations jointly with the activity detector using them. Our experiments with multiple public video datasets confirm that the proposed concept of latent super-event learning significantly benefits activity detection, advancing the state-of-the-arts.

READ FULL TEXT

page 1

page 2

page 7

page 8

research
05/26/2016

Learning Latent Sub-events in Activity Videos Using Temporal Attention Filters

In this paper, we newly introduce the concept of temporal attention filt...
research
03/16/2018

Activity Detection with Latent Sub-event Hierarchy Learning

In this paper, we introduce a new convolutional layer named the Temporal...
research
01/14/2020

Recognizing Video Events with Varying Rhythms

Recognizing Video events in long, complex videos with multiple sub-activ...
research
06/23/2022

Anticipating the cost of drought events in France by super learning

Drought events are the second most expensive type of natural disaster wi...
research
06/20/2014

Early Recognition of Human Activities from First-Person Videos Using Onset Representations

In this paper, we propose a methodology for early recognition of human a...
research
11/09/2015

Detecting events and key actors in multi-person videos

Multi-person event recognition is a challenging task, often with many pe...
research
05/24/2016

EventNet Version 1.1 Technical Report

EventNet is a large-scale video corpus and event ontology consisting of ...

Please sign up or login with your details

Forgot password? Click here to reset