AV Speech Enhancement Challenge using a Real Noisy Corpus

09/30/2019
by   Mandar Gogate, et al.
0

This paper presents, a first of its kind, audio-visual (AV) speech enhacement challenge in real-noisy settings. A detailed description of the AV challenge, a novel real noisy AV corpus (ASPIRE), benchmark speech enhancement task, and baseline performance results are outlined. The latter are based on training a deep neural architecture on a synthetic mixture of Grid corpus and ChiME3 noises (consisting of bus, pedestrian, cafe, and street noises) and testing on the ASPIRE corpus. Subjective evaluations of five different speech enhancement algorithms (including SEAGN, spectrum subtraction (SS) , log-minimum mean-square error (LMMSE), audio-only CochleaNet, and AV CochleaNet) are presented as baseline results. The aim of the multi-modal challenge is to provide a timely opportunity for comprehensive evaluation of novel AV speech enhancement algorithms, using our new benchmark, real-noisy AV corpus and specified performance metrics. This will promote AV speech processing research globally, stimulate new ground-breaking multi-modal approaches, and attract interest from companies, academics and researchers working in AV speech technologies and applications. We encourage participants (through a challenge website sign-up) from both the speech and hearing research communities, to benefit from their complementary approaches to AV speech in noise processing.

READ FULL TEXT
research
09/23/2019

CochleaNet: A Robust Language-independent Audio-Visual Model for Speech Enhancement

Noisy situations cause huge problems for suffers of hearing loss as hear...
research
08/28/2018

Contextual Audio-Visual Switching For Speech Enhancement in Real-World Environments

Human speech processing is inherently multimodal, where visual cues (lip...
research
02/21/2022

L3DAS22 Challenge: Learning 3D Audio Sources in a Real Office Environment

The L3DAS22 Challenge is aimed at encouraging the development of machine...
research
12/22/2016

Robustness of Voice Conversion Techniques Under Mismatched Conditions

Most of the existing studies on voice conversion (VC) are conducted in a...
research
06/04/2021

A Database for Research on Detection and Enhancement of Speech Transmitted over HF links

In this paper we present an open database for the development of detecti...
research
09/04/2020

SEANet: A Multi-modal Speech Enhancement Network

We explore the possibility of leveraging accelerometer data to perform s...

Please sign up or login with your details

Forgot password? Click here to reset