Exploring Gaussian mixture model framework for speaker adaptation of deep neural network acoustic models

03/15/2020
by   Natalia Tomashenko, et al.
0

In this paper we investigate the GMM-derived (GMMD) features for adaptation of deep neural network (DNN) acoustic models. The adaptation of the DNN trained on GMMD features is done through the maximum a posteriori (MAP) adaptation of the auxiliary GMM model used for GMMD feature extraction. We explore fusion of the adapted GMMD features with conventional features, such as bottleneck and MFCC features, in two different neural network architectures: DNN and time-delay neural network (TDNN). We analyze and compare different types of adaptation techniques such as i-vectors and feature-space adaptation techniques based on maximum likelihood linear regression (fMLLR) with the proposed adaptation approach, and explore their complementarity using various types of fusion such as feature level, posterior level, lattice level and others in order to discover the best possible way of combination. Experimental results on the TED-LIUM corpus show that the proposed adaptation technique can be effectively integrated into DNN and TDNN setups at different levels and provide additional gain in recognition performance: up to 6 rate reduction (WERR) over the strong feature-space adaptation techniques based on maximum likelihood linear regression (fMLLR) speaker adapted DNN baseline, and up to 18 DNN baseline model, trained on conventional features. For TDNN models the proposed approach achieves up to 26 baseline, and up 13 The analysis of the adapted GMMD features from various points of view demonstrates their effectiveness at different levels.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/06/2015

Maximum a Posteriori Adaptation of Network Parameters in Deep Models

We present a Bayesian approach to adapting parameters of a well-trained ...
research
04/29/2019

Adversarial Speaker Adaptation

We propose a novel adversarial speaker adaptation (ASA) scheme, in which...
research
06/21/2019

Integration of TensorFlow based Acoustic Model with Kaldi WFST Decoder

While the Kaldi framework provides state-of-the-art components for speec...
research
07/13/2023

Deep Neural Networks for Semiparametric Frailty Models via H-likelihood

For prediction of clustered time-to-event data, we propose a new deep ne...
research
03/31/2016

Differentiable Pooling for Unsupervised Acoustic Model Adaptation

We present a deep neural network (DNN) acoustic model that includes para...
research
11/27/2022

A Self-adaptive Neuroevolution Approach to Constructing Deep Neural Network Architectures Across Different Types

Neuroevolution has greatly promoted Deep Neural Network (DNN) architectu...
research
02/23/2016

The IBM 2016 Speaker Recognition System

In this paper we describe the recent advancements made in the IBM i-vect...

Please sign up or login with your details

Forgot password? Click here to reset