Revisiting joint decoding based multi-talker speech recognition with DNN acoustic model

10/31/2021
by   Martin Kocour, et al.
0

In typical multi-talker speech recognition systems, a neural network-based acoustic model predicts senone state posteriors for each speaker. These are later used by a single-talker decoder which is applied on each speaker-specific output stream separately. In this work, we argue that such a scheme is sub-optimal and propose a principled solution that decodes all speakers jointly. We modify the acoustic model to predict joint state posteriors for all speakers, enabling the network to express uncertainty about the attribution of parts of the speech signal to the speakers. We employ a joint decoder that can make use of this uncertainty together with higher-level language information. For this, we revisit decoding algorithms used in factorial generative models in early multi-talker speech recognition systems. In contrast with these early works, we replace the GMM acoustic model with DNN, which provides greater modeling power and simplifies part of the inference. We demonstrate the advantage of joint decoding in proof of concept experiments on a mixed-TIDIGITS dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/13/2019

Speaker-Targeted Audio-Visual Models for Speech Recognition in Cocktail-Party Environments

Speech recognition in cocktail-party environments remains a significant ...
research
06/21/2019

Integration of TensorFlow based Acoustic Model with Kaldi WFST Decoder

While the Kaldi framework provides state-of-the-art components for speec...
research
10/30/2021

Speaker conditioning of acoustic models using affine transformation for multi-speaker speech recognition

This study addresses the problem of single-channel Automatic Speech Reco...
research
04/20/2016

Speaker Cluster-Based Speaker Adaptive Training for Deep Neural Network Acoustic Modeling

A speaker cluster-based speaker adaptive training (SAT) method under dee...
research
10/05/2016

Monaural Multi-Talker Speech Recognition using Factorial Speech Processing Models

A Pascal challenge entitled monaural multi-talker speech recognition was...
research
11/08/2012

Multi-input Multi-output Beta Wavelet Network: Modeling of Acoustic Units for Speech Recognition

In this paper, we propose a novel architecture of wavelet network called...
research
08/04/2016

An improved uncertainty decoding scheme with weighted samples for DNN-HMM hybrid systems

In this paper, we advance a recently-proposed uncertainty decoding schem...

Please sign up or login with your details

Forgot password? Click here to reset