A Multi-Discriminator CycleGAN for Unsupervised Non-Parallel Speech Domain Adaptation

03/27/2018
by   Ehsan Hosseini-Asl, et al.
0

Domain adaptation plays an important role for speech recognition models, in particular, for domains that have low resources. We propose a novel generative model based on cyclic-consistent generative adversarial network (CycleGAN) for unsupervised non-parallel speech domain adaptation. The proposed model employs multiple independent discriminator on the power spectrogram, each in charge of different frequency bands. As a result we have 1) better discriminators that focuses on fine-grained details of the frequency features, and 2) a generator that is capable of generating more realistic domain adapted spectrogram. We demonstrate the effectiveness of our method on speech recognition with gender adaptation, where the model only have access to supervised data from one gender during training, but is evaluated on the other at testing time. Our model is able to achieve an average of 7.41% on phoneme error rate, and 11.10% word error rate relative performance improvement as compared to the baseline on TIMIT and WSJ dataset, respectively. Qualitatively, our model also generate more realistic sounding speech synthesis when conditioned on data from the other domain.

READ FULL TEXT
research
08/04/2021

Unsupervised Domain Adaptation in Speech Recognition using Phonetic Features

Automatic speech recognition is a difficult problem in pattern recogniti...
research
08/14/2020

Adaptation Algorithms for Speech Recognition: An Overview

We present a structured overview of adaptation algorithms for neural net...
research
03/30/2022

Joint domain adaptation and speech bandwidth extension using time-domain GANs for speaker verification

Speech systems developed for a particular choice of acoustic domain and ...
research
04/08/2019

Completely Unsupervised Phoneme Recognition By A Generative Adversarial Network Harmonized With Iteratively Refined Hidden Markov Models

Producing a large annotated speech corpus for training ASR systems remai...
research
01/05/2021

Domain-aware Neural Language Models for Speech Recognition

As voice assistants become more ubiquitous, they are increasingly expect...
research
05/17/2020

Single Channel Far Field Feature Enhancement For Speaker Verification In The Wild

We investigated an enhancement and a domain adaptation approach to make ...
research
11/05/2020

Multi-Accent Adaptation based on Gate Mechanism

When only a limited amount of accented speech data is available, to prom...

Please sign up or login with your details

Forgot password? Click here to reset