Monaural Multi-Talker Speech Recognition using Factorial Speech Processing Models

10/05/2016
by   Mahdi Khademian, et al.
0

A Pascal challenge entitled monaural multi-talker speech recognition was developed, targeting the problem of robust automatic speech recognition against speech like noises which significantly degrades the performance of automatic speech recognition systems. In this challenge, two competing speakers say a simple command simultaneously and the objective is to recognize speech of the target speaker. Surprisingly during the challenge, a team from IBM research, could achieve a performance better than human listeners on this task. The proposed method of the IBM team, consist of an intermediate speech separation and then a single-talker speech recognition. This paper reconsiders the task of this challenge based on gain adapted factorial speech processing models. It develops a joint-token passing algorithm for direct utterance decoding of both target and masker speakers, simultaneously. Comparing it to the challenge winner, it uses maximum uncertainty during the decoding which cannot be used in the past two-phased method. It provides detailed derivation of inference on these models based on general inference procedures of probabilistic graphical models. As another improvement, it uses deep neural networks for joint-speaker identification and gain estimation which makes these two steps easier than before producing competitive results for these steps. The proposed method of this work outperforms past super-human results and even the results were achieved recently by Microsoft research, using deep neural networks. It achieved 5.5 super-human system and 2.7 its recent competitor.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/17/2019

Simultaneous Speech Recognition and Speaker Diarization for Monaural Dialogue Recordings with Target-Speaker Acoustic Models

This paper investigates the use of target-speaker automatic speech recog...
research
07/10/2017

Feature Joint-State Posterior Estimation in Factorial Speech Processing Models using Deep Neural Networks

This paper proposes a new method for calculating joint-state posteriors ...
research
01/09/2020

Open Challenge for Correcting Errors of Speech Recognition Systems

The paper announces the new long-term challenge for improving the perfor...
research
05/25/2023

Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator

Multi-talker overlapped speech poses a significant challenge for speech ...
research
03/09/2020

Deep Neural Networks for Automatic Speech Processing: A Survey from Large Corpora to Limited Data

Most state-of-the-art speech systems are using Deep Neural Networks (DNN...
research
10/31/2021

Revisiting joint decoding based multi-talker speech recognition with DNN acoustic model

In typical multi-talker speech recognition systems, a neural network-bas...
research
07/22/2022

Toward Fairness in Speech Recognition: Discovery and mitigation of performance disparities

As for other forms of AI, speech recognition has recently been examined ...

Please sign up or login with your details

Forgot password? Click here to reset