Modeling State-Conditional Observation Distribution using Weighted Stereo Samples for Factorial Speech Processing Models

03/09/2015
by   Mahdi Khademian, et al.
0

This paper investigates the effectiveness of factorial speech processing models in noise-robust automatic speech recognition tasks. For this purpose, the paper proposes an idealistic approach for modeling state-conditional observation distribution of factorial models based on weighted stereo samples. This approach is an extension to previous single pass retraining for ideal model compensation which is extended here to support multiple audio sources. Non-stationary noises can be considered as one of these audio sources with multiple states. Experiments of this paper over the set A of the Aurora 2 dataset show that recognition performance can be improved by this consideration. The improvement is significant in low signal to noise energy conditions, up to 4 power of the proposed method in accurate representation of state-conditional observation distribution, it has an important advantage over previous methods by providing the opportunity to independently select feature spaces for both source and corrupted features. This opens a new window for seeking better feature spaces appropriate for noisy speech, independent from clean speech features.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/03/2022

Joint Speech Recognition and Audio Captioning

Speech samples recorded in both indoor and outdoor environments are ofte...
research
10/24/2022

Brouhaha: multi-task training for voice activity detection, speech-to-noise ratio, and C50 room acoustics estimation

Most automatic speech processing systems are sensitive to the acoustic e...
research
07/14/2015

Feature Normalisation for Robust Speech Recognition

Speech recognition system performance degrades in noisy environments. If...
research
07/15/2013

Modified SPLICE and its Extension to Non-Stereo Data for Noise Robust Speech Recognition

In this paper, a modification to the training process of the popular SPL...
research
08/20/2020

Blind Mask to Improve Intelligibility of Non-Stationary Noisy Speech

This letter proposes a novel blind acoustic mask (BAM) designed to adapt...
research
06/13/2023

Statistical Beamformer Exploiting Non-stationarity and Sparsity with Spatially Constrained ICA for Robust Speech Recognition

In this paper, we present a statistical beamforming algorithm as a pre-p...
research
07/10/2017

Feature Joint-State Posterior Estimation in Factorial Speech Processing Models using Deep Neural Networks

This paper proposes a new method for calculating joint-state posteriors ...

Please sign up or login with your details

Forgot password? Click here to reset