Improve Video Representation with Temporal Adversarial Augmentation

04/28/2023
by   Jinhao Duan, et al.
0

Recent works reveal that adversarial augmentation benefits the generalization of neural networks (NNs) if used in an appropriate manner. In this paper, we introduce Temporal Adversarial Augmentation (TA), a novel video augmentation technique that utilizes temporal attention. Unlike conventional adversarial augmentation, TA is specifically designed to shift the attention distributions of neural networks with respect to video clips by maximizing a temporal-related loss function. We demonstrate that TA will obtain diverse temporal views, which significantly affect the focus of neural networks. Training with these examples remedies the flaw of unbalanced temporal information perception and enhances the ability to defend against temporal shifts, ultimately leading to better generalization. To leverage TA, we propose Temporal Video Adversarial Fine-tuning (TAF) framework for improving video representations. TAF is a model-agnostic, generic, and interpretability-friendly training strategy. We evaluate TAF with four powerful models (TSM, GST, TAM, and TPN) over three challenging temporal-related benchmarks (Something-something V1 V2 and diving48). Experimental results demonstrate that TAF effectively improves the test accuracy of these models with notable margins without introducing additional parameters or computational costs. As a byproduct, TAF also improves the robustness under out-of-distribution (OOD) settings. Code is available at https://github.com/jinhaoduan/TAF.

READ FULL TEXT

page 2

page 3

page 4

page 7

research
04/01/2021

Composable Augmentation Encoding for Video Representation Learning

We focus on contrastive methods for self-supervised video representation...
research
05/31/2021

HiddenCut: Simple Data Augmentation for Natural Language Understanding with Better Generalization

Fine-tuning large pre-trained models with task-specific data has achieve...
research
01/30/2023

Adversarial Style Augmentation for Domain Generalization

It is well-known that the performance of well-trained deep neural networ...
research
12/06/2022

Fine-tuned CLIP Models are Efficient Video Learners

Large-scale multi-modal training with image-text pairs imparts strong ge...
research
11/26/2022

Exploring Temporal Information Dynamics in Spiking Neural Networks

Most existing Spiking Neural Network (SNN) works state that SNNs may uti...
research
12/08/2019

Adversarial Pyramid Network for Video Domain Generalization

This paper introduces a new research problem of video domain generalizat...
research
03/26/2023

Frame Flexible Network

Existing video recognition algorithms always conduct different training ...

Please sign up or login with your details

Forgot password? Click here to reset