Exploiting Cross-Lingual Speaker and Phonetic Diversity for Unsupervised Subword Modeling

08/09/2019
by   Siyuan Feng, et al.
0

This research addresses the problem of acoustic modeling of low-resource languages for which transcribed training data is absent. The goal is to learn robust frame-level feature representations that can be used to identify and distinguish subword-level speech units. The proposed feature representations comprise various types of multilingual bottleneck features (BNFs) that are obtained via multi-task learning of deep neural networks (MTL-DNN). One of the key problems is how to acquire high-quality frame labels for untranscribed training data to facilitate supervised DNN training. It is shown that learning of robust BNF representations can be achieved by effectively leveraging transcribed speech data and well-trained automatic speech recognition (ASR) systems from one or more out-of-domain (resource-rich) languages. Out-of-domain ASR systems can be applied to perform speaker adaptation with untranscribed training data of the target language, and to decode the training speech into frame-level labels for DNN training. It is also found that better frame labels can be generated by considering temporal dependency in speech when performing frame clustering. The proposed methods of feature learning are evaluated on the standard task of unsupervised subword modeling in Track 1 of the ZeroSpeech 2017 Challenge. The best performance achieved by our system is 9.7% in terms of across-speaker triphone minimal-pair ABX error rate, which is comparable to the best systems reported recently. Lastly, our investigation reveals that the closeness between target languages and out-of-domain languages and the amount of available training data for individual target languages could have significant impact on the goodness of learned features.

READ FULL TEXT
research
06/17/2019

Improving Unsupervised Subword Modeling via Disentangled Speech Representation Learning and Transformation

This study tackles unsupervised subword modeling in the zero-resource sc...
research
05/19/2020

Generative Adversarial Training Data Adaptation for Very Low-resource Automatic Speech Recognition

It is important to transcribe and archive speech data of endangered lang...
research
11/09/2018

Multilingual and Unsupervised Subword Modeling for Zero-Resource Languages

Unsupervised subword modeling aims to learn low-level representations of...
research
06/17/2019

Combining Adversarial Training and Disentangled Speech Representation for Robust Zero-Resource Subword Modeling

This study addresses the problem of unsupervised subword unit discovery ...
research
07/29/2020

Exploiting Cross-Lingual Knowledge in Unsupervised Acoustic Modeling for Low-Resource Languages

(Short version of Abstract) This thesis describes an investigation on un...
research
01/10/2017

Multi-task Learning Of Deep Neural Networks For Audio Visual Automatic Speech Recognition

Multi-task learning (MTL) involves the simultaneous training of two or m...
research
07/25/2020

Unsupervised Subword Modeling Using Autoregressive Pretraining and Cross-Lingual Phone-Aware Modeling

This study addresses unsupervised subword modeling, i.e., learning featu...

Please sign up or login with your details

Forgot password? Click here to reset