Cooperative Cross-Stream Network for Discriminative Action Representation

08/27/2019
by   Jingran Zhang, et al.
17

Spatial and temporal stream model has gained great success in video action recognition. Most existing works pay more attention to designing effective features fusion methods, which train the two-stream model in a separate way. However, it's hard to ensure discriminability and explore complementary information between different streams in existing works. In this work, we propose a novel cooperative cross-stream network that investigates the conjoint information in multiple different modalities. The jointly spatial and temporal stream networks feature extraction is accomplished by an end-to-end learning manner. It extracts this complementary information of different modality from a connection block, which aims at exploring correlations of different stream features. Furthermore, different from the conventional ConvNet that learns the deep separable features with only one cross-entropy loss, our proposed model enhances the discriminative power of the deeply learned features and reduces the undesired modality discrepancy by jointly optimizing a modality ranking constraint and a cross-entropy loss for both homogeneous and heterogeneous modalities. The modality ranking constraint constitutes intra-modality discriminative embedding and inter-modality triplet constraint, and it reduces both the intra-modality and cross-modality feature variations. Experiments on three benchmark datasets demonstrate that by cooperating appearance and motion feature extraction, our method can achieve state-of-the-art or competitive performance compared with existing results.

READ FULL TEXT

page 1

page 3

page 8

page 9

research
04/23/2021

Exploring Modality-shared Appearance Features and Modality-invariant Relation Features for Cross-modality Person Re-Identification

Most existing cross-modality person re-identification works rely on disc...
research
02/28/2020

Cross-modality Person re-identification with Shared-Specific Feature Transfer

Cross-modality person re-identification (cm-ReID) is a challenging but k...
research
01/27/2023

Skeleton-based Action Recognition through Contrasting Two-Stream Spatial-Temporal Networks

For pursuing accurate skeleton-based action recognition, most prior meth...
research
08/01/2019

Two-Stream Video Classification with Cross-Modality Attention

Fusing multi-modality information is known to be able to effectively bri...
research
03/01/2023

Feature Extraction Matters More: Universal Deepfake Disruption through Attacking Ensemble Feature Extractors

Adversarial example is a rising way of protecting facial privacy securit...
research
11/14/2020

RGBT Tracking via Multi-Adapter Network with Hierarchical Divergence Loss

RGBT tracking has attracted increasing attention since RGB and thermal i...
research
08/02/2023

WCCNet: Wavelet-integrated CNN with Crossmodal Rearranging Fusion for Fast Multispectral Pedestrian Detection

Multispectral pedestrian detection achieves better visibility in challen...

Please sign up or login with your details

Forgot password? Click here to reset