Unsupervised Speech Domain Adaptation Based on Disentangled Representation Learning for Robust Speech Recognition

04/12/2019
by   Jong-Hyeon Park, et al.
0

In general, the performance of automatic speech recognition (ASR) systems is significantly degraded due to the mismatch between training and test environments. Recently, a deep-learning-based image-to-image translation technique to translate an image from a source domain to a desired domain was presented, and cycle-consistent adversarial network (CycleGAN) was applied to learn a mapping for speech-to-speech conversion from a speaker to a target speaker. However, this method might not be adequate to remove corrupting noise components for robust ASR because it was designed to convert speech itself. In this paper, we propose a domain adaptation method based on generative adversarial nets (GANs) with disentangled representation learning to achieve robustness in ASR systems. In particular, two separated encoders, context and domain encoders, are introduced to learn distinct latent variables. The latent variables allow us to convert the domain of speech according to its context and domain representation. We improved word accuracies by 6.55 15.70% for the CHiME4 challenge corpus by applying a noisy-to-clean environment adaptation for robust ASR. In addition, similar to the method based on the CycleGAN, this method can be used for gender adaptation in gender-mismatched recognition.

READ FULL TEXT
research
08/04/2021

Unsupervised Domain Adaptation in Speech Recognition using Phonetic Features

Automatic speech recognition is a difficult problem in pattern recogniti...
research
03/07/2018

Extracting Domain Invariant Features by Unsupervised Learning for Robust Automatic Speech Recognition

The performance of automatic speech recognition (ASR) systems can be sig...
research
11/27/2016

Invariant Representations for Noisy Speech Recognition

Modern automatic speech recognition (ASR) systems need to be robust unde...
research
11/26/2020

Unsupervised Domain Adaptation for Speech Recognition via Uncertainty Driven Self-Training

The performance of automatic speech recognition (ASR) systems typically ...
research
12/18/2019

A Cycle-GAN Approach to Model Natural Perturbations in Speech for ASR Applications

Naturally introduced perturbations in audio signal, caused by emotional ...
research
10/24/2020

Unsupervised Learning of Disentangled Speech Content and Style Representation

We present an approach for unsupervised learning of speech representatio...
research
10/17/2018

Exploring Textual and Speech information in Dialogue Act Classification with Speaker Domain Adaptation

In spite of the recent success of Dialogue Act (DA) classification, the ...

Please sign up or login with your details

Forgot password? Click here to reset