VPN++: Rethinking Video-Pose embeddings for understanding Activities of Daily Living

05/17/2021
by   Srijan Das, et al.
0

Many attempts have been made towards combining RGB and 3D poses for the recognition of Activities of Daily Living (ADL). ADL may look very similar and often necessitate to model fine-grained details to distinguish them. Because the recent 3D ConvNets are too rigid to capture the subtle visual patterns across an action, this research direction is dominated by methods combining RGB and 3D Poses. But the cost of computing 3D poses from RGB stream is high in the absence of appropriate sensors. This limits the usage of aforementioned approaches in real-world applications requiring low latency. Then, how to best take advantage of 3D Poses for recognizing ADL? To this end, we propose an extension of a pose driven attention mechanism: Video-Pose Network (VPN), exploring two distinct directions. One is to transfer the Pose knowledge into RGB through a feature-level distillation and the other towards mimicking pose driven attention through an attention-level distillation. Finally, these two approaches are integrated into a single model, we call VPN++. We show that VPN++ is not only effective but also provides a high speed up and high resilience to noisy Poses. VPN++, with or without 3D Poses, outperforms the representative baselines on 4 public datasets. Code is available at https://github.com/srijandas07/vpnplusplus.

READ FULL TEXT

page 1

page 5

page 11

research
07/06/2020

VPN: Learning Video-Pose Embedding for Activities of Daily Living

In this paper, we focus on the spatio-temporal aspect of recognizing Act...
research
09/03/2021

Video Pose Distillation for Few-Shot, Fine-Grained Sports Action Recognition

Human pose is a useful feature for fine-grained sports action understand...
research
06/15/2023

Seeing the Pose in the Pixels: Learning Pose-Aware Representations in Vision Transformers

Human perception of surroundings is often guided by the various poses pr...
research
08/03/2022

Multimodal Generation of Novel Action Appearances for Synthetic-to-Real Recognition of Activities of Daily Living

Domain shifts, such as appearance changes, are a key challenge in real-w...
research
02/06/2023

Fine-Grained Action Detection with RGB and Pose Information using Two Stream Convolutional Networks

As participants of the MediaEval 2022 Sport Task, we propose a two-strea...
research
07/12/2021

Let's Play for Action: Recognizing Activities of Daily Living by Learning from Life Simulation Video Games

Recognizing Activities of Daily Living (ADL) is a vital process for inte...
research
09/06/2023

PDiscoNet: Semantically consistent part discovery for fine-grained recognition

Fine-grained classification often requires recognizing specific object p...

Please sign up or login with your details

Forgot password? Click here to reset