Asynchronous Temporal Fields for Action Recognition

12/19/2016
by   Gunnar A. Sigurdsson, et al.
0

Actions are more than just movements and trajectories: we cook to eat and we hold a cup to drink from it. A thorough understanding of videos requires going beyond appearance modeling and necessitates reasoning about the sequence of activities, as well as the higher-level constructs such as intentions. But how do we model and reason about these? We propose a fully-connected temporal CRF model for reasoning over various aspects of activities that includes objects, actions, and intentions, where the potentials are predicted by a deep network. End-to-end training of such structured models is a challenging endeavor: For inference and learning we need to construct mini-batches consisting of whole videos, leading to mini-batches with only a few videos. This causes high-correlation between data points leading to breakdown of the backprop algorithm. To address this challenge, we present an asynchronous variational inference method that allows efficient end-to-end training. Our method achieves a classification mAP of 22.4 state-of-the-art (17.2 localization.

READ FULL TEXT

page 1

page 3

page 8

page 18

page 19

page 20

research
04/20/2022

THORN: Temporal Human-Object Relation Network for Action Recognition

Most action recognition models treat human activities as unitary events....
research
01/11/2019

Anticipation and next action forecasting in video: an end-to-end model with memory

Action anticipation and forecasting in videos do not require a hat-trick...
research
03/09/2017

UntrimmedNets for Weakly Supervised Action Recognition and Detection

Current action recognition methods heavily rely on trimmed videos for mo...
research
11/16/2018

Relational Long Short-Term Memory for Video Action Recognition

Spatial and temporal relationships, both short-range and long-range, bet...
research
01/21/2021

Activity Graph Transformer for Temporal Action Localization

We introduce Activity Graph Transformer, an end-to-end learnable model f...
research
05/23/2023

Full Resolution Repetition Counting

Given an untrimmed video, repetitive actions counting aims to estimate t...
research
08/09/2017

What Actions are Needed for Understanding Human Actions in Videos?

What is the right way to reason about human activities? What directions ...

Please sign up or login with your details

Forgot password? Click here to reset