Is the Ideal Ratio Mask Really the Best? – Exploring the Best Extraction Performance and Optimal Mask of Mask-based Beamformers

09/21/2023
by   Atsuo Hiroe, et al.
0

This study investigates mask-based beamformers (BFs), which estimate filters to extract target speech using time-frequency masks. Although several BF methods have been proposed, the following aspects are yet to be comprehensively investigated. 1) Which BF can provide the best extraction performance in terms of the closeness of the BF output to the target speech? 2) Is the optimal mask for the best performance common for all BFs? 3) Is the ideal ratio mask (IRM) identical to the optimal mask? Accordingly, we investigate these issues considering four mask-based BFs: the maximum signal-to-noise ratio BF, two variants of this, and the multichannel Wiener filter (MWF) BF. To obtain the optimal mask corresponding to the peak performance for each BF, we employ an approach that minimizes the mean square error between the BF output and target speech for each utterance. Via the experiments with the CHiME-3 dataset, we verify that the four BFs have the same peak performance as the upper bound provided by the ideal MWF BF, whereas the optimal mask depends on the adopted BF and differs from the IRM. These observations differ from the conventional idea that the optimal mask is common for all BFs and that peak performance differs for each BF. Hence, this study contributes to the design of mask-based BFs.

READ FULL TEXT

page 1

page 4

research
09/04/2017

Using Optimal Ratio Mask as Training Target for Supervised Speech Separation

Supervised speech separation uses supervised learning algorithms to lear...
research
05/28/2021

Phoneme-Based Ratio Mask Estimation for Reverberant Speech Enhancement in Cochlear Implant Processors

Cochlear implant (CI) users have considerable difficulty in understandin...
research
08/20/2020

Blind Mask to Improve Intelligibility of Non-Stationary Noisy Speech

This letter proposes a novel blind acoustic mask (BAM) designed to adapt...
research
12/09/2012

Self Authentication of image through Daubechies Transform technique (SADT)

In this paper a 4 x 4 Daubechies transform based authentication techniqu...
research
04/17/2019

Deep Filtering: Signal Extraction Using Complex Time-Frequency Filters

Signal extraction from a single-channel mixture with additional undesire...
research
02/02/2019

Is CQT more suitable for monaural speech separation than STFT? an empirical study

Short-time Fourier transform (STFT) is used as the front end of many pop...
research
05/20/2022

Estimation of binary time-frequency masks from ambient noise

We investigate the retrieval of a binary time-frequency mask from a few ...

Please sign up or login with your details

Forgot password? Click here to reset