Modality Distillation with Multiple Stream Networks for Action Recognition

06/19/2018
by   Nuno Garcia, et al.
0

Diverse input data modalities can provide complementary cues for several tasks, usually leading to more robust algorithms and better performance. However, while a (training) dataset could be accurately designed to include a variety of sensory inputs, it is often the case that not all modalities could be available in real life (testing) scenarios, where a model has to be deployed. This raises the challenge of how to learn robust representations leveraging multimodal data in the training stage, while considering limitations at test time, such as noisy or missing modalities. This paper presents a new approach for multimodal video action recognition, developed within the unified frameworks of distillation and privileged information, named generalized distillation. Particularly, we consider the case of learning representations from depth and RGB videos, while relying on RGB data only at test time. We propose a new approach to train an hallucination network that learns to distill depth features through multiplicative connections of spatiotemporal representations, leveraging soft labels and hard labels, as well as distance between feature maps. We report state-of-the-art results on video action classification on the largest multimodal dataset available for this task, the NTU RGB+D.

READ FULL TEXT

page 5

page 9

page 14

research
10/19/2018

Learning with privileged information via adversarial discriminative modality distillation

Heterogeneous data modalities can provide complementary cues for several...
research
12/23/2019

DMCL: Distillation Multiple Choice Learning for Multimodal Action Recognition

In this work, we address the problem of learning an ensemble of speciali...
research
03/23/2016

Deep Multimodal Feature Analysis for Action Recognition in RGB+D Videos

Single modality action recognition on RGB or depth sequences has been ex...
research
01/31/2020

Modality Compensation Network: Cross-Modal Adaptation for Action Recognition

With the prevalence of RGB-D cameras, multi-modal video data have become...
research
07/14/2023

Multimodal Distillation for Egocentric Action Recognition

The focal point of egocentric video understanding is modelling hand-obje...
research
09/15/2019

Multitask Learning to Improve Egocentric Action Recognition

In this work we employ multitask learning to capitalize on the structure...
research
08/10/2023

Ensemble Modeling for Multimodal Visual Action Recognition

In this work, we propose an ensemble modeling approach for multimodal ac...

Please sign up or login with your details

Forgot password? Click here to reset