Dual Domain-Adversarial Learning for Audio-Visual Saliency Prediction

08/10/2022
by   Yingzi Fan, et al.
2

Both visual and auditory information are valuable to determine the salient regions in videos. Deep convolution neural networks (CNN) showcase strong capacity in coping with the audio-visual saliency prediction task. Due to various factors such as shooting scenes and weather, there often exists moderate distribution discrepancy between source training data and target testing data. The domain discrepancy induces to performance degradation on target testing data for CNN models. This paper makes an early attempt to tackle the unsupervised domain adaptation problem for audio-visual saliency prediction. We propose a dual domain-adversarial learning algorithm to mitigate the domain discrepancy between source and target data. First, a specific domain discrimination branch is built up for aligning the auditory feature distributions. Then, those auditory features are fused into the visual features through a cross-modal self-attention module. The other domain discrimination branch is devised to reduce the domain discrepancy of visual features and audio-visual correlations implied by the fused audio-visual features. Experiments on public benchmarks demonstrate that our method can relieve the performance degradation caused by domain discrepancy.

READ FULL TEXT

page 2

page 3

page 4

page 6

page 7

research
04/28/2023

AVATAR: Adversarial self-superVised domain Adaptation network for TARget domain

This paper presents an unsupervised domain adaptation (UDA) method for p...
research
12/31/2021

An Unsupervised Domain Adaptation Model based on Dual-module Adversarial Training

In this paper, we propose a dual-module network architecture that employ...
research
11/29/2019

Correlation-aware Adversarial Domain Adaptation and Generalization

Domain adaptation (DA) and domain generalization (DG) have emerged as a ...
research
06/22/2020

Feature Alignment and Restoration for Domain Generalization and Adaptation

For domain generalization (DG) and unsupervised domain adaptation (UDA),...
research
07/09/2022

Dual-path Attention is All You Need for Audio-Visual Speech Extraction

Audio-visual target speech extraction, which aims to extract a certain s...
research
10/09/2017

Personalized Saliency and its Prediction

Almost all existing visual saliency models focus on predicting a univers...
research
04/28/2021

Group Feature Learning and Domain Adversarial Neural Network for aMCI Diagnosis System Based on EEG

Medical diagnostic robot systems have been paid more and more attention ...

Please sign up or login with your details

Forgot password? Click here to reset