Feature-Supervised Action Modality Transfer

08/06/2021
by   Fida Mohammad Thoker, et al.
6

This paper strives for action recognition and detection in video modalities like RGB, depth maps or 3D-skeleton sequences when only limited modality-specific labeled examples are available. For the RGB, and derived optical-flow, modality many large-scale labeled datasets have been made available. They have become the de facto pre-training choice when recognizing or detecting new actions from RGB datasets that have limited amounts of labeled examples available. Unfortunately, large-scale labeled action datasets for other modalities are unavailable for pre-training. In this paper, our goal is to recognize actions from limited examples in non-RGB video modalities, by learning from large-scale labeled RGB data. To this end, we propose a two-step training process: (i) we extract action representation knowledge from an RGB-trained teacher network and adapt it to a non-RGB student network. (ii) we then fine-tune the transfer model with available labeled examples of the target modality. For the knowledge transfer we introduce feature-supervision strategies, which rely on unlabeled pairs of two modalities (the RGB and the target modality) to transfer feature level representations from the teacher to the student network. Ablations and generalizations with two RGB source datasets and two non-RGB target datasets demonstrate that an optical-flow teacher provides better action transfer features than RGB for both depth maps and 3D-skeletons, even when evaluated on a different target domain, or for a different task. Compared to alternative cross-modal action transfer methods we show a good improvement in performance especially when labeled non-RGB examples to learn from are scarce

READ FULL TEXT

page 1

page 3

page 4

page 7

research
07/02/2015

Cross Modal Distillation for Supervision Transfer

In this work we propose a technique that transfers supervision between i...
research
10/12/2022

Distilling Knowledge from Language Models for Video-based Action Anticipation

Anticipating future actions in a video is useful for many autonomous and...
research
09/08/2022

Cross-Modal Knowledge Transfer Without Task-Relevant Source Data

Cost-effective depth and infrared sensors as alternatives to usual RGB s...
research
11/30/2017

Graph Distillation for Action Detection with Privileged Information

In this work, we propose a technique that tackles the video understandin...
research
11/04/2020

Mutual Modality Learning for Video Action Classification

The construction of models for video action classification progresses ra...
research
10/09/2022

Students taught by multimodal teachers are superior action recognizers

The focal point of egocentric video understanding is modelling hand-obje...
research
05/22/2019

What Would You Expect? Anticipating Egocentric Actions with Rolling-Unrolling LSTMs and Modality Attention

Egocentric action anticipation consists in understanding which objects t...

Please sign up or login with your details

Forgot password? Click here to reset