Multilingual Zero Resource Speech Recognition Base on Self-Supervise Pre-Trained Acoustic Models

10/13/2022
by   Haoyu Wang, et al.
0

Labeled audio data is insufficient to build satisfying speech recognition systems for most of the languages in the world. There have been some zero-resource methods trying to perform phoneme or word-level speech recognition without labeled audio data of the target language, but the error rate of these methods is usually too high to be applied in real-world scenarios. Recently, the representation ability of self-supervise pre-trained models has been found to be extremely beneficial in zero-resource phoneme recognition. As far as we are concerned, this paper is the first attempt to extend the use of pre-trained models into word-level zero-resource speech recognition. This is done by fine-tuning the pre-trained models on IPA phoneme transcriptions and decoding with a language model trained on extra texts. Experiments on Wav2vec 2.0 and HuBERT models show that this method can achieve less than 20 8 languages is 33.77

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/22/2023

Scaling Speech Technology to 1,000+ Languages

Expanding the language coverage of speech technology has the potential t...
research
07/23/2021

Brazilian Portuguese Speech Recognition Using Wav2vec 2.0

Deep learning techniques have been shown to be efficient in various task...
research
10/25/2020

Probing Acoustic Representations for Phonetic Properties

Pre-trained acoustic representations such as wav2vec and DeCoAR have att...
research
07/18/2023

Zero-shot Domain-sensitive Speech Recognition with Prompt-conditioning Fine-tuning

In this work, we propose a method to create domain-sensitive speech reco...
research
06/06/2023

RescueSpeech: A German Corpus for Speech Recognition in Search and Rescue Domain

Despite recent advancements in speech recognition, there are still diffi...
research
11/16/2021

Integrated Semantic and Phonetic Post-correction for Chinese Speech Recognition

Due to the recent advances of natural language processing, several works...
research
09/19/2021

Wav-BERT: Cooperative Acoustic and Linguistic Representation Learning for Low-Resource Speech Recognition

Unifying acoustic and linguistic representation learning has become incr...

Please sign up or login with your details

Forgot password? Click here to reset