K-Wav2vec 2.0: Automatic Speech Recognition based on Joint Decoding of Graphemes and Syllables

10/11/2021
by   Jounghee Kim, et al.
0

Wav2vec 2.0 is an end-to-end framework of self-supervised learning for speech representation that is successful in automatic speech recognition (ASR), but most of the work on the topic has been developed with a single language: English. Therefore, it is unclear whether the self-supervised framework is effective in recognizing other languages with different writing systems, such as Korean which uses the Hangul having a unique writing system. In this paper, we present K-Wav2Vec 2.0, which is a modified version of Wav2vec 2.0 designed for Korean automatic speech recognition by exploring and optimizing various factors of the original Wav2vec 2.0. In fine-tuning, we propose a multi-task hierarchical architecture to reflect the Korean writing structure. Moreover, a joint decoder is applied to alleviate the problem of words existing outside of the vocabulary. In pre-training, we attempted the cross-lingual transfer of the pre-trained model by further pre-training the English Wav2vec 2.0 on a Korean dataset, considering limited resources. Our experimental results demonstrate that the proposed method yields the best performance on both Korean ASR datasets: Ksponspeech (a large-scale Korean speech corpus) and Clovacall (a call-based dialog corpus). Further pre-training is also effective in language adaptation, leading to large improvements without additional data.

READ FULL TEXT
research
11/29/2022

MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech Recognition

In this paper, we propose a novel multi-modal multi-task encoder-decoder...
research
10/09/2021

Wav2vec-S: Semi-Supervised Pre-Training for Speech Recognition

Self-supervised pre-training has dramatically improved the performance o...
research
10/26/2022

UFO2: A unified pre-training framework for online and offline speech recognition

In this paper, we propose a Unified pre-training Framework for Online an...
research
05/17/2022

Deploying self-supervised learning in the wild for hybrid automatic speech recognition

Self-supervised learning (SSL) methods have proven to be very successful...
research
05/11/2020

Incremental Learning for End-to-End Automatic Speech Recognition

We propose an incremental learning for end-to-end Automatic Speech Recog...
research
06/29/2022

The THUEE System Description for the IARPA OpenASR21 Challenge

This paper describes the THUEE team's speech recognition system for the ...
research
06/25/2023

Addressing Cold Start Problem for End-to-end Automatic Speech Scoring

Integrating automatic speech scoring/assessment systems has become a cri...

Please sign up or login with your details

Forgot password? Click here to reset