Efficient Face Detection with Audio-Based Region Proposals

09/14/2023
by   William Aris, et al.
0

Robot vision often involves a large computational load due to large images to process in a short amount of time. Existing solutions often involve reducing image quality which can negatively impact processing. Another approach is to generate regions of interest with expensive vision algorithms. In this paper, we evaluate how audio can be used to generate regions of interest in optical images. To achieve this, we propose a unique attention mechanism to localize speech sources and evaluate its impact on a face detection algorithm. Our results show that the attention mechanism reduces the computational load. The proposed pipeline is flexible and can be easily adapted for human-robot interactions, robot surveillance, video-conferences or smart glasses.

READ FULL TEXT

page 2

page 4

research
07/08/2023

FTFDNet: Learning to Detect Talking Face Video Manipulation with Tri-Modality Interaction

DeepFake based digital facial forgery is threatening public media securi...
research
03/08/2022

Attention-Based Lip Audio-Visual Synthesis for Talking Face Generation in the Wild

Talking face generation with great practical significance has attracted ...
research
10/03/2019

On the Detection of Digital Face Manipulation

Detecting manipulated facial images and videos is an increasingly import...
research
01/05/2019

AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection

Active speaker detection is an important component in video analysis alg...
research
11/29/2019

Attentive Modality Hopping Mechanism for Speech Emotion Recognition

In this work, we explore the impact of visual modality in addition to sp...
research
12/20/2022

Visual Transformers for Primates Classification and Covid Detection

We apply the vision transformer, a deep machine learning model build aro...

Please sign up or login with your details

Forgot password? Click here to reset