Dilated U-net based approach for multichannel speech enhancement from First-Order Ambisonics recordings

06/02/2020
by   Amélie Bosca, et al.
0

We present a CNN architecture for speech enhancement from multichannel first-order Ambisonics mixtures. The data-dependent spatial filters, deduced from a mask-based approach, are used to help an automatic speech recognition engine to face adverse conditions of reverberation and competitive speakers. The mask predictions are provided by a neural network, fed with rough estimations of speech and noise amplitude spectra, under the assumption of known directions of arrival. This study evaluates the replacing of the recurrent LSTM network previously investigated by a convolutive U-net under more stressing conditions with an additional second competitive speaker. We show that, due to more accurate short-term masks prediction, the U-net architecture brings some improvements in terms of word error rate. Moreover, results indicate that the use of dilated convolutive layers is beneficial in difficult situations with two interfering speakers, and/or where the target and interferences are close to each other in terms of the angular distance. Moreover, these results come with a two-fold reduction in the number of parameters.

READ FULL TEXT
research
11/14/2019

Speaker independence of neural vocoders and their effect on parametric resynthesis speech enhancement

Traditional speech enhancement systems produce speech with compromised q...
research
05/15/2020

Speaker Re-identification with Speaker Dependent Speech Enhancement

While the use of deep neural networks has significantly boosted speaker ...
research
04/02/2019

Speech denoising by parametric resynthesis

This work proposes the use of clean speech vocoder parameters as the tar...
research
09/17/2023

Continuous Modeling of the Denoising Process for Speech Enhancement Based on Deep Learning

In this paper, we explore a continuous modeling approach for deep-learni...
research
05/10/2020

Cognitive-driven convolutional beamforming using EEG-based auditory attention decoding

The performance of speech enhancement algorithms in a multi-speaker scen...
research
12/07/2020

Towards end-to-end speech enhancement with a variational U-Net architecture

In this paper, we investigate the viability of a variational U-Net archi...
research
05/17/2022

Streaming Noise Context Aware Enhancement For Automatic Speech Recognition in Multi-Talker Environments

One of the most challenging scenarios for smart speakers is multi-talker...

Please sign up or login with your details

Forgot password? Click here to reset