BEA-Base: A Benchmark for ASR of Spontaneous Hungarian

02/01/2022
by   P. Mihajlik, et al.
0

Hungarian is spoken by 15 million people, still, easily accessible Automatic Speech Recognition (ASR) benchmark datasets - especially for spontaneous speech - have been practically unavailable. In this paper, we introduce BEA-Base, a subset of the BEA spoken Hungarian database comprising mostly spontaneous speech of 140 speakers. It is built specifically to assess ASR, primarily for conversational AI applications. After defining the speech recognition subsets and task, several baselines - including classic HMM-DNN hybrid and end-to-end approaches augmented by cross-language transfer learning - are developed using open-source toolkits. The best results obtained are based on multilingual self-supervised pretraining, achieving a 45 as compared to the classical approach - without the application of an external language model or additional supervised data. The results show the feasibility of using BEA-Base for training and evaluation of Hungarian speech recognition systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/24/2023

Adaptation of Whisper models to child speech recognition

Automatic Speech Recognition (ASR) systems often struggle with transcrib...
research
10/21/2022

Deep LSTM Spoken Term Detection using Wav2Vec 2.0 Recognizer

In recent years, the standard hybrid DNN-HMM speech recognizers are outp...
research
06/15/2022

Transformer-based Automatic Speech Recognition of Formal and Colloquial Czech in MALACH Project

Czech is a very specific language due to its large differences between t...
research
05/14/2022

Pretraining Approaches for Spoken Language Recognition: TalTech Submission to the OLR 2021 Challenge

This paper investigates different pretraining approaches to spoken langu...
research
09/21/2020

End-to-End Bengali Speech Recognition

Bengali is a prominent language of the Indian subcontinent. However, whi...
research
10/15/2021

Multilingual Speech Recognition using Knowledge Transfer across Learning Processes

Multilingual end-to-end(E2E) models have shown a great potential in the ...
research
07/20/2022

Towards Transfer Learning of wav2vec 2.0 for Automatic Lyric Transcription

Automatic speech recognition (ASR) has progressed significantly in recen...

Please sign up or login with your details

Forgot password? Click here to reset