Generative Adversarial Training Data Adaptation for Very Low-resource Automatic Speech Recognition

05/19/2020
by   Kohei Matsuura, et al.
0

It is important to transcribe and archive speech data of endangered languages for preserving heritages of verbal culture and automatic speech recognition (ASR) is a powerful tool to facilitate this process. However, since endangered languages do not generally have large corpora with many speakers, the performance of ASR models trained on them are considerably poor in general. Nevertheless, we are often left with a lot of recordings of spontaneous speech data that have to be transcribed. In this work, for mitigating this speaker sparsity problem, we propose to convert the whole training speech data and make it sound like the test speaker in order to develop a highly accurate ASR system for this speaker. For this purpose, we utilize a CycleGAN-based non-parallel voice conversion technology to forge a labeled training data that is close to the test speaker's speech. We evaluated this speaker adaptation approach on two low-resource corpora, namely, Ainu and Mboshi. We obtained 35-60 improvement in phone error rate on the Ainu corpus, and 40 improvement was attained on the Mboshi corpus. This approach outperformed two conventional methods namely unsupervised adaptation and multilingual training with these two corpora.

READ FULL TEXT
research
08/10/2023

A Novel Self-training Approach for Low-resource Speech Recognition

In this paper, we propose a self-training approach for automatic speech ...
research
08/26/2022

Investigating data partitioning strategies for crosslinguistic low-resource ASR evaluation

Many automatic speech recognition (ASR) data sets include a single pre-d...
research
08/09/2019

Exploiting Cross-Lingual Speaker and Phonetic Diversity for Unsupervised Subword Modeling

This research addresses the problem of acoustic modeling of low-resource...
research
06/10/2023

Adversarial Training For Low-Resource Disfluency Correction

Disfluencies commonly occur in conversational speech. Speech with disflu...
research
11/12/2019

Privacy-Preserving Adversarial Representation Learning in ASR: Reality or Illusion?

Automatic speech recognition (ASR) is a key technology in many services ...
research
10/12/2022

Can we use Common Voice to train a Multi-Speaker TTS system?

Training of multi-speaker text-to-speech (TTS) systems relies on curated...
research
07/14/2023

Towards dialect-inclusive recognition in a low-resource language: are balanced corpora the answer?

ASR systems are generally built for the spoken 'standard', and their per...

Please sign up or login with your details

Forgot password? Click here to reset