Applying wav2vec2.0 to Speech Recognition in various low-resource languages

12/22/2020
by   Cheng Yi, et al.
0

Several domains own corresponding widely used feature extractors, such as ResNet, BERT, and GPT-x. These models are pre-trained on large amounts of unlabelled data by self-supervision and can be effectively applied for downstream tasks. In the speech domain, wav2vec2.0 starts to show its powerful representation ability and feasibility of ultra-low resource speech recognition on Librispeech corpus. However, this model has not been tested on real spoken scenarios and languages other than English. To verify its universality over languages, we apply the released pre-trained models to solve low-resource speech recognition tasks in various spoken languages. We achieve more than 20% relative improvements in six languages compared with previous works. Among these languages, English improves up to 52.4%. Moreover, using coarse-grained modeling units, such as subword and character, achieves better results than the letter.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/10/2023

A Novel Self-training Approach for Low-resource Speech Recognition

In this paper, we propose a self-training approach for automatic speech ...
research
01/29/2020

Learning Robust and Multilingual Speech Representations

Unsupervised speech representation learning has shown remarkable success...
research
03/26/2021

Leveraging neural representations for facilitating access to untranscribed speech from endangered languages

For languages with insufficient resources to train speech recognition sy...
research
11/02/2022

A Quantum Kernel Learning Approach to Acoustic Modeling for Spoken Command Recognition

We propose a quantum kernel learning (QKL) framework to address the inhe...
research
07/28/2018

Domain Robust Feature Extraction for Rapid Low Resource ASR Development

Developing a practical speech recognizer for a low resource language is ...
research
02/12/2022

Wav2Vec2.0 on the Edge: Performance Evaluation

Wav2Vec2.0 is a state-of-the-art model which learns speech representatio...
research
07/06/2022

Kaggle Competition: Cantonese Audio-Visual Speech Recognition for In-car Commands

With the rise of deep learning and intelligent vehicles, the smart assis...

Please sign up or login with your details

Forgot password? Click here to reset