CMD: Self-supervised 3D Action Representation Learning with Cross-modal Mutual Distillation

08/26/2022
by   Yunyao Mao, et al.
0

In 3D action recognition, there exists rich complementary information between skeleton modalities. Nevertheless, how to model and utilize this information remains a challenging problem for self-supervised 3D action representation learning. In this work, we formulate the cross-modal interaction as a bidirectional knowledge distillation problem. Different from classic distillation solutions that transfer the knowledge of a fixed and pre-trained teacher to the student, in this work, the knowledge is continuously updated and bidirectionally distilled between modalities. To this end, we propose a new Cross-modal Mutual Distillation (CMD) framework with the following designs. On the one hand, the neighboring similarity distribution is introduced to model the knowledge learned in each modality, where the relational information is naturally suitable for the contrastive frameworks. On the other hand, asymmetrical configurations are used for teacher and student to stabilize the distillation process and to transfer high-confidence information between modalities. By derivation, we find that the cross-modal positive mining in previous works can be regarded as a degenerated version of our CMD. We perform extensive experiments on NTU RGB+D 60, NTU RGB+D 120, and PKU-MMD II datasets. Our approach outperforms existing self-supervised methods and sets a series of new records. The code is available at: https://github.com/maoyunyao/CMD

READ FULL TEXT
research
10/10/2019

Cross-modal knowledge distillation for action recognition

In this work, we address the problem how a network for action recognitio...
research
11/25/2022

XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation Learning

We present XKD, a novel self-supervised framework to learn meaningful re...
research
08/08/2021

Learning an Augmented RGB Representation with Cross-Modal Knowledge Distillation for Action Detection

In video understanding, most cross-modal knowledge distillation (KD) met...
research
04/16/2023

Robust Cross-Modal Knowledge Distillation for Unconstrained Videos

Cross-modal distillation has been widely used to transfer knowledge acro...
research
09/21/2023

Elevating Skeleton-Based Action Recognition with Efficient Multi-Modality Self-Supervision

Self-supervised representation learning for human action recognition has...
research
04/07/2021

Farewell to Mutual Information: Variational Distillation for Cross-Modal Person Re-Identification

The Information Bottleneck (IB) provides an information theoretic princi...
research
07/02/2015

Cross Modal Distillation for Supervision Transfer

In this work we propose a technique that transfers supervision between i...

Please sign up or login with your details

Forgot password? Click here to reset