AV-Gaze: A Study on the Effectiveness of Audio Guided Visual Attention Estimation for Non-Profilic Faces

07/07/2022
by   Shreya Ghosh, et al.
6

In challenging real-life conditions such as extreme head-pose, occlusions, and low-resolution images where the visual information fails to estimate visual attention/gaze direction, audio signals could provide important and complementary information. In this paper, we explore if audio-guided coarse head-pose can further enhance visual attention estimation performance for non-prolific faces. Since it is difficult to annotate audio signals for estimating the head-pose of the speaker, we use off-the-shelf state-of-the-art models to facilitate cross-modal weak-supervision. During the training phase, the framework learns complementary information from synchronized audio-visual modality. Our model can utilize any of the available modalities i.e. audio, visual or audio-visual for task-specific inference. It is interesting to note that, when AV-Gaze is tested on benchmark datasets with these specific modalities, it achieves competitive results on multiple datasets, while being highly adaptive towards challenging scenarios.

READ FULL TEXT

page 2

page 4

research
07/03/2019

Cascade Attention Guided Residue Learning GAN for Cross-Modal Translation

Since we were babies, we intuitively develop the ability to correlate th...
research
10/23/2021

MTGLS: Multi-Task Gaze Estimation with Limited Supervision

Robust gaze estimation is a challenging task, even for deep CNNs, due to...
research
11/22/2017

CMCGAN: A Uniform Framework for Cross-Modal Visual-Audio Mutual Generation

Visual and audio modalities are two symbiotic modalities underlying vide...
research
09/03/2022

Synthesizing Photorealistic Virtual Humans Through Cross-modal Disentanglement

Over the last few decades, many aspects of human life have been enhanced...
research
04/03/2020

Comparison of a Head-Mounted Display and a Curved Screen in a Multi-Talker Audiovisual Listening Task

Virtual audiovisual technology has matured and its use in research is wi...
research
04/30/2020

APB2Face: Audio-guided face reenactment with auxiliary pose and blink signals

Audio-guided face reenactment aims at generating photorealistic faces us...
research
05/06/2023

Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation

Egocentric gaze anticipation serves as a key building block for the emer...

Please sign up or login with your details

Forgot password? Click here to reset