Cross-modal knowledge distillation for action recognition

10/10/2019
by   Fida Mohammad Thoker, et al.
0

In this work, we address the problem how a network for action recognition that has been trained on a modality like RGB videos can be adapted to recognize actions for another modality like sequences of 3D human poses. To this end, we extract the knowledge of the trained teacher network for the source modality and transfer it to a small ensemble of student networks for the target modality. For the cross-modal knowledge distillation, we do not require any annotated data. Instead we use pairs of sequences of both modalities as supervision, which are straightforward to acquire. In contrast to previous works for knowledge distillation that use a KL-loss, we show that the cross-entropy loss together with mutual learning of a small ensemble of student networks performs better. In fact, the proposed approach for cross-modal knowledge distillation nearly achieves the accuracy of a student network trained with full supervision.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/26/2022

CMD: Self-supervised 3D Action Representation Learning with Cross-modal Mutual Distillation

In 3D action recognition, there exists rich complementary information be...
research
12/12/2020

Periocular in the Wild Embedding Learning with Cross-Modal Consistent Knowledge Distillation

Periocular biometric, or peripheral area of ocular, is a collaborative a...
research
11/24/2021

EvDistill: Asynchronous Events to End-task Learning via Bidirectional Reconstruction-guided Cross-modal Knowledge Distillation

Event cameras sense per-pixel intensity changes and produce asynchronous...
research
01/18/2022

Cross-modal Contrastive Distillation for Instructional Activity Anticipation

In this study, we aim to predict the plausible future action steps given...
research
02/16/2023

Cross Modal Distillation for Flood Extent Mapping

The increasing intensity and frequency of floods is one of the many cons...
research
10/12/2022

Distilling Knowledge from Language Models for Video-based Action Anticipation

Anticipating future actions in a video is useful for many autonomous and...
research
09/01/2020

Semantics-aware Adaptive Knowledge Distillation for Sensor-to-Vision Action Recognition

Existing vision-based action recognition is susceptible to occlusion and...

Please sign up or login with your details

Forgot password? Click here to reset