Adversarial Pyramid Network for Video Domain Generalization

12/08/2019
by   Zhiyu Yao, et al.
0

This paper introduces a new research problem of video domain generalization (video DG) where most state-of-the-art action recognition networks degenerate due to the lack of exposure to the target domains of divergent distributions. While recent advances in video understanding focus on capturing the temporal relations of the long-term video context, we observe that the global temporal features are less generalizable in the video DG settings. The reason is that videos from other unseen domains may have unexpected absence, misalignment, or scale transformation of the temporal relations, which is known as the temporal domain shift. Therefore, the video DG is even more challenging than the image DG, which is also under-explored, because of the entanglement of the spatial and temporal domain shifts. This finding has led us to view the key to video DG as how to effectively learn the local-relation features of different time scales that are more generalizable, and how to exploit them along with the global-relation features to maintain the discriminability. This paper presents the Adversarial Pyramid Network (APN), which captures the local-relation, global-relation, and multilayer cross-relation features progressively. This pyramid network not only improves the feature transferability from the view of representation learning, but also enhances the diversity and quality of the new data points that can bridge different domains when it is integrated with an improved version of the image DG adversarial data augmentation method. We construct four video DG benchmarks: UCF-HMDB, Something-Something, PKU-MMD, and NTU, in which the source and target domains are divided according to different datasets, different consequences of actions, or different camera views. The APN consistently outperforms previous action recognition models over all benchmarks.

READ FULL TEXT

page 4

page 8

research
11/09/2022

Extending Temporal Data Augmentation for Video Action Recognition

Pixel space augmentation has grown in popularity in many Deep Learning a...
research
07/31/2020

Adversarial Bipartite Graph Learning for Video Domain Adaptation

Domain adaptation techniques, which focus on adapting models between dis...
research
05/19/2018

DenseImage Network: Video Spatial-Temporal Evolution Encoding and Understanding

Many of the leading approaches for video understanding are data-hungry a...
research
11/11/2017

End-to-end Video-level Representation Learning for Action Recognition

From the frame/clip-level feature learning to the video-level representa...
research
07/11/2021

Aligning Correlation Information for Domain Adaptation in Action Recognition

Domain adaptation (DA) approaches address domain shift and enable networ...
research
11/30/2020

Video Self-Stitching Graph Network for Temporal Action Localization

Temporal action localization (TAL) in videos is a challenging task, espe...
research
04/28/2023

Improve Video Representation with Temporal Adversarial Augmentation

Recent works reveal that adversarial augmentation benefits the generaliz...

Please sign up or login with your details

Forgot password? Click here to reset