Learning an Augmented RGB Representation with Cross-Modal Knowledge Distillation for Action Detection

08/08/2021
by   Rui Dai, et al.
0

In video understanding, most cross-modal knowledge distillation (KD) methods are tailored for classification tasks, focusing on the discriminative representation of the trimmed videos. However, action detection requires not only categorizing actions, but also localizing them in untrimmed videos. Therefore, transferring knowledge pertaining to temporal relations is critical for this task which is missing in the previous cross-modal KD frameworks. To this end, we aim at learning an augmented RGB representation for action detection, taking advantage of additional modalities at training time through KD. We propose a KD framework consisting of two levels of distillation. On one hand, atomic-level distillation encourages the RGB student to learn the sub-representation of the actions from the teacher in a contrastive manner. On the other hand, sequence-level distillation encourages the student to learn the temporal knowledge from the teacher, which consists of transferring the Global Contextual Relations and the Action Boundary Saliency. The result is an Augmented-RGB stream that can achieve competitive performance as the two-stream network while using only RGB at inference time. Extensive experimental analysis shows that our proposed distillation framework is generic and outperforms other popular cross-modal distillation methods in action detection task.

READ FULL TEXT

page 1

page 8

research
08/26/2022

CMD: Self-supervised 3D Action Representation Learning with Cross-modal Mutual Distillation

In 3D action recognition, there exists rich complementary information be...
research
01/18/2022

Cross-modal Contrastive Distillation for Instructional Activity Anticipation

In this study, we aim to predict the plausible future action steps given...
research
04/01/2020

Knowledge as Priors: Cross-Modal Knowledge Generalization for Datasets without Superior Knowledge

Cross-modal knowledge distillation deals with transferring knowledge fro...
research
03/30/2023

Decomposed Cross-modal Distillation for RGB-based Temporal Action Detection

Temporal action detection aims to predict the time intervals and the cla...
research
03/03/2023

X^3KD: Knowledge Distillation Across Modalities, Tasks and Stages for Multi-Camera 3D Object Detection

Recent advances in 3D object detection (3DOD) have obtained remarkably s...
research
08/17/2022

Progressive Cross-modal Knowledge Distillation for Human Action Recognition

Wearable sensor-based Human Action Recognition (HAR) has achieved remark...
research
11/30/2017

Graph Distillation for Action Detection with Privileged Information

In this work, we propose a technique that tackles the video understandin...

Please sign up or login with your details

Forgot password? Click here to reset