Unsupervised Subword Modeling Using Autoregressive Pretraining and Cross-Lingual Phone-Aware Modeling

07/25/2020
by   Siyuan Feng, et al.
0

This study addresses unsupervised subword modeling, i.e., learning feature representations that can distinguish subword units of a language. The proposed approach adopts a two-stage bottleneck feature (BNF) learning framework, consisting of autoregressive predictive coding (APC) as a front-end and a DNN-BNF model as a back-end. APC pretrained features are set as input features to a DNN-BNF model. A language-mismatched ASR system is used to provide cross-lingual phone labels for DNN-BNF model training. Finally, BNFs are extracted as the subword-discriminative feature representation. A second aim of this work is to investigate the robustness of our approach's effectiveness to different amounts of training data. The results on Libri-light and the ZeroSpeech 2017 databases show that APC is effective in front-end feature pretraining. Our whole system outperforms the state of the art on both databases. Cross-lingual phone labels for English data by a Dutch ASR outperform those by a Mandarin ASR, possibly linked to the larger similarity of Dutch compared to Mandarin with English. Our system is less sensitive to training data amount when the training data is over 50 hours. APC pretraining leads to a reduction of needed training material from over 5,000 hours to around 200 hours with little performance degradation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/17/2020

The effectiveness of unsupervised subword modeling with autoregressive and cross-lingual phone-aware networks

This study addresses unsupervised subword modeling, i.e., learning acous...
research
04/17/2022

Language Contamination Explains the Cross-lingual Capabilities of English Pretrained Models

English pretrained language models, which make up the backbone of many m...
research
02/07/2020

Unsupervised pretraining transfers well across languages

Cross-lingual and multi-lingual training of Automatic Speech Recognition...
research
11/12/2021

Deciphering Speech: a Zero-Resource Approach to Cross-Lingual Transfer in ASR

We present a method for cross-lingual training an ASR system using absol...
research
08/09/2019

Exploiting Cross-Lingual Speaker and Phonetic Diversity for Unsupervised Subword Modeling

This research addresses the problem of acoustic modeling of low-resource...
research
04/02/2021

Unsupervised Acoustic Unit Discovery by Leveraging a Language-Independent Subword Discriminative Feature Representation

This paper tackles automatically discovering phone-like acoustic units (...
research
11/01/2021

Exploring Non-Autoregressive End-To-End Neural Modeling For English Mispronunciation Detection And Diagnosis

End-to-end (E2E) neural modeling has emerged as one predominant school o...

Please sign up or login with your details

Forgot password? Click here to reset