Multi-Channel Target Speaker Extraction with Refinement: The WavLab Submission to the Second Clarity Enhancement Challenge

02/15/2023
by   Samuele Cornell, et al.
0

This paper describes our submission to the Second Clarity Enhancement Challenge (CEC2), which consists of target speech enhancement for hearing-aid (HA) devices in noisy-reverberant environments with multiple interferers such as music and competing speakers. Our approach builds upon the powerful iterative neural/beamforming enhancement (iNeuBe) framework introduced in our recent work, and this paper extends it for target speaker extraction. We therefore name the proposed approach as iNeuBe-X, where the X stands for extraction. To address the challenges encountered in the CEC2 setting, we introduce four major novelties: (1) we extend the state-of-the-art TF-GridNet model, originally designed for monaural speaker separation, for multi-channel, causal speech enhancement, and large improvements are observed by replacing the TCNDenseNet used in iNeuBe with this new architecture; (2) we leverage a recent dual window size approach with future-frame prediction to ensure that iNueBe-X satisfies the 5 ms constraint on algorithmic latency required by CEC2; (3) we introduce a novel speaker-conditioning branch for TF-GridNet to achieve target speaker extraction; (4) we propose a fine-tuning step, where we compute an additional loss with respect to the target speaker signal compensated with the listener audiogram. Without using external data, on the official development set our best model reaches a hearing-aid speech perception index (HASPI) score of 0.942 and a scale-invariant signal-to-distortion ratio improvement (SI-SDRi) of 18.8 dB. These results are promising given the fact that the CEC2 data is extremely challenging (e.g., on the development set the mixture SI-SDR is -12.3 dB). A demo of our submitted system is available at WAVLab CEC2 demo.

READ FULL TEXT

page 1

page 2

page 3

research
02/10/2022

Royalflush Speaker Diarization System for ICASSP 2022 Multi-channel Multi-party Meeting Transcription Challenge

This paper describes the Royalflush speaker diarization system submitted...
research
05/15/2020

Speaker Re-identification with Speaker Dependent Speech Enhancement

While the use of deep neural networks has significantly boosted speaker ...
research
11/22/2022

TF-GridNet: Integrating Full- and Sub-Band Modeling for Speech Separation

We propose TF-GridNet for speech separation. The model is a novel multi-...
research
08/08/2023

Target Speech Extraction with Conditional Diffusion Model

Diffusion model-based speech enhancement has received increased attentio...
research
05/10/2020

Cognitive-driven convolutional beamforming using EEG-based auditory attention decoding

The performance of speech enhancement algorithms in a multi-speaker scen...
research
12/10/2021

Learning-based personal speech enhancement for teleconferencing by exploiting spatial-spectral features

Teleconferencing is becoming essential during the COVID-19 pandemic. How...
research
10/18/2021

Similarity-and-Independence-Aware Beamformer with Iterative Casting and Boost Start for Target Source Extraction Using Reference

Target source extraction is significant for improving human speech intel...

Please sign up or login with your details

Forgot password? Click here to reset