Social Scene Understanding: End-to-End Multi-Person Action Localization and Collective Activity Recognition

11/28/2016
by   Timur Bagautdinov, et al.
0

We present a unified framework for understanding human social behaviors in raw image sequences. Our model jointly detects multiple individuals, infers their social actions, and estimates the collective actions with a single feed-forward pass through a neural network. We propose a single architecture that does not rely on external detection algorithms but rather is trained end-to-end to generate dense proposal maps that are refined via a novel inference scheme. The temporal consistency is handled via a person-level matching Recurrent Neural Network. The complete model takes as input a sequence of frames and outputs detections along with the estimates of individual actions and collective activities. We demonstrate state-of-the-art performance of our algorithm on multiple publicly available benchmarks.

READ FULL TEXT

page 1

page 4

page 8

research
04/10/2019

H+O: Unified Egocentric Recognition of 3D Hand-Object Poses and Interactions

We present a unified framework for understanding 3D hand and object inte...
research
04/10/2017

CERN: Confidence-Energy Recurrent Network for Group Activity Recognition

This work is about recognizing human activities occurring in videos at d...
research
07/12/2022

Hunting Group Clues with Transformers for Social Group Activity Recognition

This paper presents a novel framework for social group activity recognit...
research
12/26/2018

A Multi-Stream Convolutional Neural Network Framework for Group Activity Recognition

In this work, we present a framework based on multi-stream convolutional...
research
01/21/2021

Hierarchical Graph-RNNs for Action Detection of Multiple Activities

In this paper, we propose an approach that spatially localizes the activ...
research
03/21/2022

LocATe: End-to-end Localization of Actions in 3D with Transformers

Understanding a person's behavior from their 3D motion is a fundamental ...
research
06/12/2013

Recurrent Convolutional Neural Networks for Scene Parsing

Scene parsing is a technique that consist on giving a label to all pixel...

Please sign up or login with your details

Forgot password? Click here to reset