Embeddings for DNN speaker adaptive training

09/30/2019
by   Joanna Rownicka, et al.
0

In this work, we investigate the use of embeddings for speaker-adaptive training of DNNs (DNN-SAT) focusing on a small amount of adaptation data per speaker. DNN-SAT can be viewed as learning a mapping from each embedding to transformation parameters that are applied to the shared parameters of the DNN. We investigate different approaches to applying these transformations, and find that with a good training strategy, a multi-layer adaptation network applied to all hidden layers is no more effective than a single linear layer acting on the embeddings to transform the input features. In the second part of our work, we evaluate different embeddings (i-vectors, x-vectors and deep CNN embeddings) in an additional speaker recognition task in order to gain insight into what should characterize an embedding for DNN-SAT. We find the performance for speaker recognition of a given representation is not correlated with its ASR performance; in fact, ability to capture more speech attributes than just speaker identity was the most important characteristic of the embeddings for efficient DNN-SAT ASR. Our best models achieved relative WER gains of 4 over DNN baselines using speaker-level cepstral mean normalisation (CMN), and a fully speaker-independent model, respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/17/2017

Embedding-Based Speaker Adaptive Training of Deep Neural Networks

An embedding-based speaker adaptive training (SAT) approach is proposed ...
research
06/26/2022

Improving the Training Recipe for a Robust Conformer-based Hybrid Model

Speaker adaptation is important to build robust automatic speech recogni...
research
04/06/2021

Speaker Diarization using Two-pass Leave-One-Out Gaussian PLDA Clustering of DNN Embeddings

Many modern systems for speaker diarization, such as the recently-develo...
research
01/11/2023

Improving And Analyzing Neural Speaker Embeddings for ASR

Neural speaker embeddings encode the speaker's speech characteristics th...
research
02/07/2021

U-vectors: Generating clusterable speaker embedding from unlabeled data

Speaker recognition deals with recognizing speakers by their speech. Str...
research
07/31/2018

Scaling and bias codes for modeling speaker-adaptive DNN-based speech synthesis systems

Most neural-network based speaker-adaptive acoustic models for speech sy...
research
06/16/2022

Nonwords Pronunciation Classification in Language Development Tests for Preschool Children

This work aims to automatically evaluate whether the language developmen...

Please sign up or login with your details

Forgot password? Click here to reset