Speaker-Invariant Training via Adversarial Learning

04/02/2018
by   Zhong Meng, et al.
0

We propose a novel adversarial multi-task learning scheme, aiming at actively curtailing the inter-talker feature variability while maximizing its senone discriminability so as to enhance the performance of a deep neural network (DNN) based ASR system. We call the scheme speaker-invariant training (SIT). In SIT, a DNN acoustic model and a speaker classifier network are jointly optimized to minimize the senone (tied triphone state) classification loss, and simultaneously mini-maximize the speaker classification loss. A speaker-invariant and senone-discriminative deep feature is learned through this adversarial multi-task learning. With SIT, a canonical DNN acoustic model with significantly reduced variance in its output probabilities is learned with no explicit speaker-independent (SI) transformations or speaker-specific representations used in training or testing. Evaluated on the CHiME-3 dataset, the SIT achieves 4.99 conventional SI acoustic model. With additional unsupervised speaker adaptation, the speaker-adapted (SA) SIT model achieves 4.86 over the SA SI acoustic model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/29/2019

Adversarial Speaker Adaptation

We propose a novel adversarial speaker adaptation (ASA) scheme, in which...
research
04/29/2019

Adversarial Speaker Verification

The use of deep networks to extract embeddings for speaker recognition h...
research
03/27/2018

Empirical Evaluation of Speaker Adaptation on DNN based Acoustic Model

Speaker adaptation aims to estimate a speaker specific acoustic model fr...
research
04/28/2019

Attentive Adversarial Learning for Domain-Invariant Training

Adversarial domain-invariant training (ADIT) proves to be effective in s...
research
12/09/2018

To Reverse the Gradient or Not: An Empirical Comparison of Adversarial and Multi-task Learning in Speech Recognition

Transcribed datasets typically contain speaker identity for each instanc...
research
05/08/2020

Multi-Task Network for Noise-Robust Keyword Spotting and Speaker Verification using CTC-based Soft VAD and Global Query Attention

Keyword spotting (KWS) and speaker verification (SV) have been studied i...
research
07/01/2019

Cosine similarity-based adversarial process

An adversarial process between two deep neural networks is a promising a...

Please sign up or login with your details

Forgot password? Click here to reset