A baseline model for computationally inexpensive speech recognition for Kazakh using the Coqui STT framework

07/19/2021
by   Ilnar Salimzianov, et al.
1

Mobile devices are transforming the way people interact with computers, and speech interfaces to applications are ever more important. Automatic Speech Recognition systems recently published are very accurate, but often require powerful machinery (specialised Graphical Processing Units) for inference, which makes them impractical to run on commodity devices, especially in streaming mode. Impressed by the accuracy of, but dissatisfied with the inference times of the baseline Kazakh ASR model of (Khassanov et al.,2021) when not using a GPU, we trained a new baseline acoustic model (on the same dataset as the aforementioned paper) and three language models for use with the Coqui STT framework. Results look promising, but further epochs of training and parameter sweeping or, alternatively, limiting the vocabulary that the ASR system must support, is needed to reach a production-level accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/23/2021

Automatic Speech Recognition using limited vocabulary: A survey

Automatic Speech Recognition (ASR) is an active field of research due to...
research
01/22/2021

Exploiting Beam Search Confidence for Energy-Efficient Speech Recognition

With computers getting more and more powerful and integrated in our dail...
research
08/04/2021

Dyn-ASR: Compact, Multilingual Speech Recognition via Spoken Language and Accent Identification

Running automatic speech recognition (ASR) on edge devices is non-trivia...
research
06/16/2020

Quantization of Acoustic Model Parameters in Automatic Speech Recognition Framework

Robust automatic speech recognition (ASR) system exploits state-of-the-a...
research
09/03/2019

Feasibility of Using Automatic Speech Recognition with Voices of Deaf and Hard-of-Hearing Individuals

Many personal devices have transitioned from visual-controlled interface...
research
09/03/2019

Automatic Speech Recognition Services: Deaf and Hard-of-Hearing Usability

Nowadays, speech is becoming a more common, if not standard, interface t...
research
10/15/2021

Omni-sparsity DNN: Fast Sparsity Optimization for On-Device Streaming E2E ASR via Supernet

From wearables to powerful smart devices, modern automatic speech recogn...

Please sign up or login with your details

Forgot password? Click here to reset