Simple and Effective Zero-shot Cross-lingual Phoneme Recognition

09/23/2021
by   Qiantong Xu, et al.
0

Recent progress in self-training, self-supervised pretraining and unsupervised learning enabled well performing speech recognition systems without any labeled data. However, in many cases there is labeled data available for related languages which is not utilized by these methods. This paper extends previous work on zero-shot cross-lingual transfer learning by fine-tuning a multilingually pretrained wav2vec 2.0 model to transcribe unseen languages. This is done by mapping phonemes of the training languages to the target language using articulatory features. Experiments show that this simple method significantly outperforms prior work which introduced task-specific architectures and used only part of a monolingually pretrained model.

READ FULL TEXT
research
12/11/2020

Orthogonal Language and Task Adapters in Zero-Shot Cross-Lingual Transfer

Adapter modules, additional trainable parameters that enable efficient f...
research
06/27/2022

Few-Shot Cross-Lingual TTS Using Transferable Phoneme Embedding

This paper studies a transferable phoneme embedding framework that aims ...
research
06/06/2023

Evaluating the Effectiveness of Natural Language Inference for Hate Speech Detection in Languages with Limited Labeled Data

Most research on hate speech detection has focused on English where a si...
research
05/19/2023

Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Mode

In this paper, we show that representations capturing syllabic units eme...
research
06/22/2021

It's All in the Heads: Using Attention Heads as a Baseline for Cross-Lingual Transfer in Commonsense Reasoning

Commonsense reasoning is one of the key problems in natural language pro...
research
03/21/2022

Match the Script, Adapt if Multilingual: Analyzing the Effect of Multilingual Pretraining on Cross-lingual Transferability

Pretrained multilingual models enable zero-shot learning even for unseen...
research
05/19/2022

Voice Activity Projection: Self-supervised Learning of Turn-taking Events

The modeling of turn-taking in dialog can be viewed as the modeling of t...

Please sign up or login with your details

Forgot password? Click here to reset