Online Localization and Prediction of Actions and Interactions

12/04/2016
by   Khurram Soomro, et al.
0

This paper proposes a person-centric and online approach to the challenging problem of localization and prediction of actions and interactions in videos. Typically, localization or recognition is performed in an offline manner where all the frames in the video are processed together. This prevents timely localization and prediction of actions and interactions - an important consideration for many tasks including surveillance and human-machine interaction. In our approach, we estimate human poses at each frame and train discriminative appearance models using the superpixels inside the pose bounding boxes. Since the pose estimation per frame is inherently noisy, the conditional probability of pose hypotheses at current time-step (frame) is computed using pose estimations in the current frame and their consistency with poses in the previous frames. Next, both the superpixel and pose-based foreground likelihoods are used to infer the location of actors at each time through a Conditional Random. The issue of visual drift is handled by updating the appearance models, and refining poses using motion smoothness on joint locations, in an online manner. For online prediction of action (interaction) confidences, we propose an approach based on Structural SVM that operates on short video segments, and is trained with the objective that confidence of an action or interaction increases as time progresses. Lastly, we quantify the performance of both detection and prediction together, and analyze how the prediction accuracy varies as a time function of observed action (interaction) at different levels of detection performance. Our experiments on several datasets suggest that despite using only a few frames to localize actions (interactions) at each time instant, we are able to obtain competitive results to state-of-the-art offline methods.

READ FULL TEXT

page 2

page 4

page 5

page 9

page 10

page 11

page 12

page 14

research
12/11/2018

Learning Discriminative Motion Features Through Detection

Despite huge success in the image domain, modern detection models such a...
research
09/18/2016

Pose from Action: Unsupervised Learning of Pose Features based on Motion

Human actions are comprised of a sequence of poses. This makes videos of...
research
08/24/2019

Dynamic Kernel Distillation for Efficient Pose Estimation in Videos

Existing video-based human pose estimation methods extensively apply lar...
research
11/30/2020

Forecasting Characteristic 3D Poses of Human Actions

We propose the task of forecasting characteristic 3D poses: from a singl...
research
04/26/2018

Deep Keyframe Detection in Human Action Videos

Detecting representative frames in videos based on human actions is quit...
research
07/10/2016

Transition Forests: Learning Discriminative Temporal Transitions for Action Recognition and Detection

A human action can be seen as transitions between one's body poses over ...
research
07/02/2023

Human-to-Human Interaction Detection

A comprehensive understanding of interested human-to-human interactions ...

Please sign up or login with your details

Forgot password? Click here to reset