Cross-modal Contrastive Distillation for Instructional Activity Anticipation

01/18/2022
by   Zhengyuan Yang, et al.
0

In this study, we aim to predict the plausible future action steps given an observation of the past and study the task of instructional activity anticipation. Unlike previous anticipation tasks that aim at action label prediction, our work targets at generating natural language outputs that provide interpretable and accurate descriptions of future action steps. It is a challenging task due to the lack of semantic information extracted from the instructional videos. To overcome this challenge, we propose a novel knowledge distillation framework to exploit the related external textual knowledge to assist the visual anticipation task. However, previous knowledge distillation techniques generally transfer information within the same modality. To bridge the gap between the visual and text modalities during the distillation process, we devise a novel cross-modal contrastive distillation (CCD) scheme, which facilitates knowledge distillation between teacher and student in heterogeneous modalities with the proposed cross-modal distillation loss. We evaluate our method on the Tasty Videos dataset. CCD improves the anticipation performance of the visual-alone student model by a large margin of 40.2 BLEU4. Our approach also outperforms the state-of-the-art approaches by a large margin.

READ FULL TEXT

page 3

page 6

research
08/08/2021

Learning an Augmented RGB Representation with Cross-Modal Knowledge Distillation for Action Detection

In video understanding, most cross-modal knowledge distillation (KD) met...
research
10/10/2019

Cross-modal knowledge distillation for action recognition

In this work, we address the problem how a network for action recognitio...
research
04/01/2020

Creating Something from Nothing: Unsupervised Knowledge Distillation for Cross-Modal Hashing

In recent years, cross-modal hashing (CMH) has attracted increasing atte...
research
06/02/2022

3D-Augmented Contrastive Knowledge Distillation for Image-based Object Pose Estimation

Image-based object pose estimation sounds amazing because in real applic...
research
06/28/2023

A Dimensional Structure based Knowledge Distillation Method for Cross-Modal Learning

Due to limitations in data quality, some essential visual tasks are diff...
research
04/16/2020

Knowledge Distillation for Action Anticipation via Label Smoothing

Human capability to anticipate near future from visual observations and ...
research
09/20/2023

Dense 2D-3D Indoor Prediction with Sound via Aligned Cross-Modal Distillation

Sound can convey significant information for spatial reasoning in our da...

Please sign up or login with your details

Forgot password? Click here to reset