Learning in Audio-visual Context: A Review, Analysis, and New Perspective

08/20/2022
by   Yake Wei, et al.
2

Sight and hearing are two senses that play a vital role in human communication and scene understanding. To mimic human perception ability, audio-visual learning, aimed at developing computational approaches to learn from both audio and visual modalities, has been a flourishing field in recent years. A comprehensive survey that can systematically organize and analyze studies of the audio-visual field is expected. Starting from the analysis of audio-visual cognition foundations, we introduce several key findings that have inspired our computational studies. Then, we systematically review the recent audio-visual learning studies and divide them into three categories: audio-visual boosting, cross-modal perception and audio-visual collaboration. Through our analysis, we discover that, the consistency of audio-visual data across semantic, spatial and temporal support the above studies. To revisit the current development of the audio-visual learning field from a more macro view, we further propose a new perspective on audio-visual scene understanding, then discuss and analyze the feasible future direction of the audio-visual learning area. Overall, this survey reviews and outlooks the current audio-visual learning field from different aspects. We hope it can provide researchers with a better understanding of this area. A website including constantly-updated survey is released: <https://gewu-lab.github.io/audio-visual-learning/>.

READ FULL TEXT

page 2

page 4

page 7

page 11

research
01/14/2020

Deep Audio-Visual Learning: A Survey

Audio-visual learning, aimed at exploiting the relationship between audi...
research
04/26/2017

Deep Cross-Modal Audio-Visual Generation

Cross-modal audio-visual perception has been a long-lasting topic in psy...
research
08/01/2021

A Survey on Audio Synthesis and Audio-Visual Multimodal Processing

With the development of deep learning and artificial intelligence, audio...
research
05/11/2022

A Comprehensive Survey of Automated Audio Captioning

Automated audio captioning, a task that mimics human perception as well ...
research
02/28/2022

Recent Advances and Challenges in Deep Audio-Visual Correlation Learning

Audio-visual correlation learning aims to capture essential corresponden...
research
09/05/2023

A Survey on Interpretable Cross-modal Reasoning

In recent years, cross-modal reasoning (CMR), the process of understandi...
research
11/22/2021

Neural Fields in Visual Computing and Beyond

Recent advances in machine learning have created increasing interest in ...

Please sign up or login with your details

Forgot password? Click here to reset