Kernel-based Sensor Fusion with Application to Audio-Visual Voice Activity Detection

04/11/2016
by   David Dov, et al.
0

In this paper, we address the problem of multiple view data fusion in the presence of noise and interferences. Recent studies have approached this problem using kernel methods, by relying particularly on a product of kernels constructed separately for each view. From a graph theory point of view, we analyze this fusion approach in a discrete setting. More specifically, based on a statistical model for the connectivity between data points, we propose an algorithm for the selection of the kernel bandwidth, a parameter, which, as we show, has important implications on the robustness of this fusion approach to interferences. Then, we consider the fusion of audio-visual speech signals measured by a single microphone and by a video camera pointed to the face of the speaker. Specifically, we address the task of voice activity detection, i.e., the detection of speech and non-speech segments, in the presence of structured interferences such as keyboard taps and office noise. We propose an algorithm for voice activity detection based on the audio-visual signal. Simulation results show that the proposed algorithm outperforms competing fusion and voice activity detection approaches. In addition, we demonstrate that a proper selection of the kernel bandwidth indeed leads to improved performance.

READ FULL TEXT

page 6

page 8

research
03/09/2020

Crossmodal learning for audio-visual speech event localization

An objective understanding of media depictions, such as about inclusive ...
research
09/21/2020

End-to-End Speaker-Dependent Voice Activity Detection

Voice activity detection (VAD) is an essential pre-processing step for t...
research
10/14/2022

Intel Labs at Ego4D Challenge 2022: A Better Baseline for Audio-Visual Diarization

This report describes our approach for the Audio-Visual Diarization (AVD...
research
08/21/2020

RespVAD: Voice Activity Detection via Video-Extracted Respiration Patterns

Voice Activity Detection (VAD) refers to the task of identification of r...
research
09/05/2023

In-Ear-Voice: Towards Milli-Watt Audio Enhancement With Bone-Conduction Microphones for In-Ear Sensing Platforms

The recent ubiquitous adoption of remote conferencing has been accompani...
research
03/07/2019

Voice Activity Detection: Merging Source and Filter-based Information

Voice Activity Detection (VAD) refers to the problem of distinguishing s...
research
10/27/2020

Rule-embedded network for audio-visual voice activity detection in live musical video streams

Detecting anchor's voice in live musical streams is an important preproc...

Please sign up or login with your details

Forgot password? Click here to reset