Deciphering Speech: a Zero-Resource Approach to Cross-Lingual Transfer in ASR

11/12/2021
by   Ondřej Klejch, et al.
0

We present a method for cross-lingual training an ASR system using absolutely no transcribed training data from the target language, and with no phonetic knowledge of the language in question. Our approach uses a novel application of a decipherment algorithm, which operates given only unpaired speech and text data from the target language. We apply this decipherment to phone sequences generated by a universal phone recogniser trained on out-of-language speech corpora, which we follow with flat-start semi-supervised training to obtain an acoustic model for the new language. To the best of our knowledge, this is the first practical approach to zero-resource cross-lingual ASR which does not rely on any hand-crafted phonetic information. We carry out experiments on read speech from the GlobalPhone corpus, and show that it is possible to learn a decipherment model on just 20 minutes of data from the target language. When used to generate pseudo-labels for semi-supervised training, we obtain WERs that range from 25 supervised models trained on the same data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/14/2023

Learning Cross-lingual Mappings for Data Augmentation to Improve Low-Resource Speech Recognition

Exploiting cross-lingual resources is an effective way to compensate for...
research
10/29/2019

a novel cross-lingual voice cloning approach with a few text-free samples

In this paper, we present a cross-lingual voice cloning approach. BN fea...
research
09/30/2020

Transfer Learning from Monolingual ASR to Transcription-free Cross-lingual Voice Conversion

Cross-lingual voice conversion (VC) is a task that aims to synthesize ta...
research
09/16/2019

Bridging the domain gap in cross-lingual document classification

The scarcity of labeled training data often prohibits the internationali...
research
07/25/2020

Unsupervised Subword Modeling Using Autoregressive Pretraining and Cross-Lingual Phone-Aware Modeling

This study addresses unsupervised subword modeling, i.e., learning featu...
research
06/01/2023

The Effects of Input Type and Pronunciation Dictionary Usage in Transfer Learning for Low-Resource Text-to-Speech

We compare phone labels and articulatory features as input for cross-lin...
research
03/06/2020

Semi-supervised Development of ASR Systems for Multilingual Code-switched Speech in Under-resourced Languages

This paper reports on the semi-supervised development of acoustic and la...

Please sign up or login with your details

Forgot password? Click here to reset