Embedding-Based Speaker Adaptive Training of Deep Neural Networks

10/17/2017
by   Xiaodong Cui, et al.
0

An embedding-based speaker adaptive training (SAT) approach is proposed and investigated in this paper for deep neural network acoustic modeling. In this approach, speaker embedding vectors, which are a constant given a particular speaker, are mapped through a control network to layer-dependent element-wise affine transformations to canonicalize the internal feature representations at the output of hidden layers of a main network. The control network for generating the speaker-dependent mappings is jointly estimated with the main network for the overall speaker adaptive acoustic modeling. Experiments on large vocabulary continuous speech recognition (LVCSR) tasks show that the proposed SAT scheme can yield superior performance over the widely-used speaker-aware training using i-vectors with speaker-adapted input features.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/20/2016

Speaker Cluster-Based Speaker Adaptive Training for Deep Neural Network Acoustic Modeling

A speaker cluster-based speaker adaptive training (SAT) method under dee...
research
09/30/2019

Embeddings for DNN speaker adaptive training

In this work, we investigate the use of embeddings for speaker-adaptive ...
research
06/02/2017

Prosodic Event Recognition using Convolutional Neural Networks with Context Information

This paper demonstrates the potential of convolutional neural networks (...
research
07/19/2017

Dynamic Layer Normalization for Adaptive Neural Acoustic Modeling in Speech Recognition

Layer normalization is a recently introduced technique for normalizing t...
research
10/23/2019

Speaker Adaptive Training using Model Agnostic Meta-Learning

Speaker adaptive training (SAT) of neural network acoustic models learns...
research
06/26/2022

Improving the Training Recipe for a Robust Conformer-based Hybrid Model

Speaker adaptation is important to build robust automatic speech recogni...
research
06/25/2021

Preliminary study on using vector quantization latent spaces for TTS/VC systems with consistent performance

Generally speaking, the main objective when training a neural speech syn...

Please sign up or login with your details

Forgot password? Click here to reset