MPN: Multimodal Parallel Network for Audio-Visual Event Localization

04/07/2021
by   Jiashuo Yu, et al.
0

Audio-visual event localization aims to localize an event that is both audible and visible in the wild, which is a widespread audio-visual scene analysis task for unconstrained videos. To address this task, we propose a Multimodal Parallel Network (MPN), which can perceive global semantics and unmixed local information parallelly. Specifically, our MPN framework consists of a classification subnetwork to predict event categories and a localization subnetwork to predict event boundaries. The classification subnetwork is constructed by the Multimodal Co-attention Module (MCM) and obtains global contexts. The localization subnetwork consists of Multimodal Bottleneck Attention Module (MBAM), which is designed to extract fine-grained segment-level contents. Extensive experiments demonstrate that our framework achieves the state-of-the-art performance both in fully supervised and weakly supervised settings on the Audio-Visual Event (AVE) dataset.

READ FULL TEXT

page 1

page 6

research
03/23/2018

Audio-Visual Event Localization in Unconstrained Videos

In this paper, we introduce a novel problem of audio-visual event locali...
research
04/05/2021

Can audio-visual integration strengthen robustness under multimodal attacks?

In this paper, we propose to make a systematic study on machines multise...
research
11/24/2021

MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video Parsing

Recognizing and localizing events in videos is a fundamental task for vi...
research
02/20/2019

Dual-modality seq2seq network for audio-visual event localization

Audio-visual event localization requires one to identify theevent which ...
research
10/02/2020

AVECL-UMONS database for audio-visual event classification and localization

We introduce the AVECL-UMons dataset for audio-visual event classificati...
research
05/08/2022

Past and Future Motion Guided Network for Audio Visual Event Localization

In recent years, audio-visual event localization has attracted much atte...
research
08/14/2020

Audio-Visual Event Localization via Recursive Fusion by Joint Co-Attention

The major challenge in audio-visual event localization task lies in how ...

Please sign up or login with your details

Forgot password? Click here to reset