Log In Sign Up

Applying wav2vec2.0 to Speech Recognition in various low-resource languages

by   Cheng Yi, et al.

Several domains own corresponding widely used feature extractors, such as ResNet, BERT, and GPT-x. These models are pre-trained on large amounts of unlabelled data by self-supervision and can be effectively applied for downstream tasks. In the speech domain, wav2vec2.0 starts to show its powerful representation ability and feasibility of ultra-low resource speech recognition on Librispeech corpus. However, this model has not been tested on real spoken scenarios and languages other than English. To verify its universality over languages, we apply the released pre-trained models to solve low-resource speech recognition tasks in various spoken languages. We achieve more than 20% relative improvements in six languages compared with previous works. Among these languages, English improves up to 52.4%. Moreover, using coarse-grained modeling units, such as subword and character, achieves better results than the letter.


page 1

page 2

page 3

page 4


Voice Conversion Can Improve ASR in Very Low-Resource Settings

Voice conversion (VC) has been proposed to improve speech recognition sy...

Learning Robust and Multilingual Speech Representations

Unsupervised speech representation learning has shown remarkable success...

Leveraging neural representations for facilitating access to untranscribed speech from endangered languages

For languages with insufficient resources to train speech recognition sy...

A Quantum Kernel Learning Approach to Acoustic Modeling for Spoken Command Recognition

We propose a quantum kernel learning (QKL) framework to address the inhe...

Domain Robust Feature Extraction for Rapid Low Resource ASR Development

Developing a practical speech recognizer for a low resource language is ...

Kaggle Competition: Cantonese Audio-Visual Speech Recognition for In-car Commands

With the rise of deep learning and intelligent vehicles, the smart assis...