Heterogeneous Reservoir Computing Models for Persian Speech Recognition

05/25/2022
by   Zohreh Ansari, et al.
0

Over the last decade, deep-learning methods have been gradually incorporated into conventional automatic speech recognition (ASR) frameworks to create acoustic, pronunciation, and language models. Although it led to significant improvements in ASRs' recognition accuracy, due to their hard constraints related to hardware requirements (e.g., computing power and memory usage), it is unclear if such approaches are the most computationally- and energy-efficient options for embedded ASR applications. Reservoir computing (RC) models (e.g., echo state networks (ESNs) and liquid state machines (LSMs)), on the other hand, have been proven inexpensive to train, have vastly fewer parameters, and are compatible with emergent hardware technologies. However, their performance in speech processing tasks is relatively inferior to that of the deep-learning-based models. To enhance the accuracy of the RC in ASR applications, we propose heterogeneous single and multi-layer ESNs to create non-linear transformations of the inputs that capture temporal context at different scales. To test our models, we performed a speech recognition task on the Farsdat Persian dataset. Since, to the best of our knowledge, standard RC has not yet been employed to conduct any Persian ASR tasks, we also trained conventional single-layer and deep ESNs to provide baselines for comparison. Besides, we compared the RC performance with a standard long-short-term memory (LSTM) model. Heterogeneous RC models (1) show improved performance to the standard RC models; (2) perform on par in terms of recognition accuracy with the LSTM, and (3) reduce the training time considerably.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/10/2019

Distributed Deep Learning Strategies For Automatic Speech Recognition

In this paper, we propose and investigate a variety of distributed deep ...
research
06/22/2016

A Comprehensive Study of Deep Bidirectional LSTM RNNs for Acoustic Modeling in Speech Recognition

We present a comprehensive study of deep bidirectional long short-term m...
research
05/10/2019

Role of non-linear data processing on speech recognition task in the framework of reservoir computing

The reservoir computing neural network architecture is widely used to te...
research
03/13/2019

Frequency Domain Multi-channel Acoustic Modeling for Distant Speech Recognition

Conventional far-field automatic speech recognition (ASR) systems typica...
research
11/23/2020

Speech Command Recognition in Computationally Constrained Environments with a Quadratic Self-organized Operational Layer

Automatic classification of speech commands has revolutionized human com...
research
06/09/2021

A Comparative Study on Neural Architectures and Training Methods for Japanese Speech Recognition

End-to-end (E2E) modeling is advantageous for automatic speech recogniti...
research
12/02/2021

A Mixture of Expert Based Deep Neural Network for Improved ASR

This paper presents a novel deep learning architecture for acoustic mode...

Please sign up or login with your details

Forgot password? Click here to reset