Audio-Visual Class-Incremental Learning

08/21/2023
by   Weiguo Pian, et al.
0

In this paper, we introduce audio-visual class-incremental learning, a class-incremental learning scenario for audio-visual video recognition. We demonstrate that joint audio-visual modeling can improve class-incremental learning, but current methods fail to preserve semantic similarity between audio and visual features as incremental step grows. Furthermore, we observe that audio-visual correlations learned in previous tasks can be forgotten as incremental steps progress, leading to poor performance. To overcome these challenges, we propose AV-CIL, which incorporates Dual-Audio-Visual Similarity Constraint (D-AVSC) to maintain both instance-aware and class-aware semantic similarity between audio-visual modalities and Visual Attention Distillation (VAD) to retain previously learned audio-guided visual attentive ability. We create three audio-visual class-incremental datasets, AVE-Class-Incremental (AVE-CI), Kinetics-Sounds-Class-Incremental (K-S-CI), and VGGSound100-Class-Incremental (VS100-CI) based on the AVE, Kinetics-Sounds, and VGGSound datasets, respectively. Our experiments on AVE-CI, K-S-CI, and VS100-CI demonstrate that AV-CIL significantly outperforms existing class-incremental learning methods in audio-visual class-incremental learning. Code and data are available at: https://github.com/weiguoPian/AV-CIL_ICCV2023.

READ FULL TEXT

page 6

page 8

page 18

page 19

research
09/11/2023

Class-Incremental Grouping Network for Continual Audio-Visual Learning

Continual learning is a challenging problem in which models need to be t...
research
08/03/2022

Estimating Visual Information From Audio Through Manifold Learning

We propose a new framework for extracting visual information about a sce...
research
01/14/2020

Deep Audio-Visual Learning: A Survey

Audio-visual learning, aimed at exploiting the relationship between audi...
research
05/22/2014

Self-tuned Visual Subclass Learning with Shared Samples An Incremental Approach

Computer vision tasks are traditionally defined and evaluated using sema...
research
03/28/2022

Attributable Visual Similarity Learning

This paper proposes an attributable visual similarity learning (AVSL) fr...
research
01/23/2020

Audiovisual SlowFast Networks for Video Recognition

We present Audiovisual SlowFast Networks, an architecture for integrated...
research
01/26/2023

Are Labels Needed for Incremental Instance Learning?

In this paper, we learn to classify visual object instances, incremental...

Please sign up or login with your details

Forgot password? Click here to reset