Learning with privileged information via adversarial discriminative modality distillation

10/19/2018
by   Nuno C. Garcia, et al.
2

Heterogeneous data modalities can provide complementary cues for several tasks, usually leading to more robust algorithms and better performance. However, while training data can be accurately collected to include a variety of sensory modalities, it is often the case that not all of them are available in real life (testing) scenarios, where a model has to be deployed. This raises the challenge of how to extract information from multimodal data in the training stage, in a form that can be exploited at test time, considering limitations such as noisy or missing modalities. This paper presents a new approach in this direction for RGB-D vision tasks, developed within the adversarial learning and privileged information frameworks. We consider the practical case of learning representations from depth and RGB videos, while relying only on RGB data at test time. We propose a new approach to train a hallucination network that learns to distill depth information via adversarial learning, resulting in a clean approach without several losses to balance or hyperparameters. We report state-of-the-art results on object classification on the NYUD dataset and video action recognition on the largest multimodal dataset available for this task, the NTU RGB+D, as well as on the Northwestern-UCLA.

READ FULL TEXT

page 1

page 5

page 6

page 7

page 10

page 12

research
06/19/2018

Modality Distillation with Multiple Stream Networks for Action Recognition

Diverse input data modalities can provide complementary cues for several...
research
02/07/2020

Input Dropout for Spatially Aligned Modalities

Computer vision datasets containing multiple modalities such as color, d...
research
12/23/2019

DMCL: Distillation Multiple Choice Learning for Multimodal Action Recognition

In this work, we address the problem of learning an ensemble of speciali...
research
09/15/2023

One-stage Modality Distillation for Incomplete Multimodal Learning

Learning based on multimodal data has attracted increasing interest rece...
research
12/19/2018

Found in Translation: Learning Robust Joint Representations by Cyclic Translations Between Modalities

Multimodal sentiment analysis is a core research area that studies speak...
research
11/30/2017

Graph Distillation for Action Detection with Privileged Information

In this work, we propose a technique that tackles the video understandin...
research
07/30/2020

Hierarchical Action Classification with Network Pruning

Research on human action classification has made significant progresses ...

Please sign up or login with your details

Forgot password? Click here to reset