Brouhaha: multi-task training for voice activity detection, speech-to-noise ratio, and C50 room acoustics estimation

10/24/2022
by   Marvin Lavechin, et al.
0

Most automatic speech processing systems are sensitive to the acoustic environment, with degraded performance when applied to noisy or reverberant speech. But how can one tell whether speech is noisy or reverberant? We propose Brouhaha, a pipeline to simulate audio segments recorded in noisy and reverberant conditions. We then use the simulated audio to jointly train the Brouhaha model for voice activity detection, signal-to-noise ratio estimation, and C50 room acoustics prediction. We show how the predicted SNR and C50 values can be used to investigate and help diagnose errors made by automatic speech processing tools (such as pyannote.audio for speaker diarization or OpenAI's Whisper for automatic speech recognition). Both our pipeline and a pretrained model are open source and shared with the speech community.

READ FULL TEXT

page 2

page 4

research
06/22/2016

A Curriculum Learning Method for Improved Noise Robustness in Automatic Speech Recognition

The performance of automatic speech recognition systems under noisy envi...
research
12/09/2021

X-Vector based voice activity detection for multi-genre broadcast speech-to-text

Voice Activity Detection (VAD) is a fundamental preprocessing step in au...
research
06/09/2019

rVAD: An Unsupervised Segment-Based Robust Voice Activity Detection Method

This paper presents an unsupervised segment-based method for robust voic...
research
01/29/2020

Environment-aware Reconfigurable Noise Suppression

The paper proposes an efficient, robust, and reconfigurable technique to...
research
09/29/2021

A Universal Deep Room Acoustics Estimator

Speech audio quality is subject to degradation caused by an acoustic env...
research
06/21/2021

EML Online Speech Activity Detection for the Fearless Steps Challenge Phase-III

Speech Activity Detection (SAD), locating speech segments within an audi...
research
03/09/2015

Modeling State-Conditional Observation Distribution using Weighted Stereo Samples for Factorial Speech Processing Models

This paper investigates the effectiveness of factorial speech processing...

Please sign up or login with your details

Forgot password? Click here to reset