A Mixture of Expert Based Deep Neural Network for Improved ASR

12/02/2021
by   Vishwanath Pratap Singh, et al.
0

This paper presents a novel deep learning architecture for acoustic model in the context of Automatic Speech Recognition (ASR), termed as MixNet. Besides the conventional layers, such as fully connected layers in DNN-HMM and memory cells in LSTM-HMM, the model uses two additional layers based on Mixture of Experts (MoE). The first MoE layer operating at the input is based on pre-defined broad phonetic classes and the second layer operating at the penultimate layer is based on automatically learned acoustic classes. In natural speech, overlap in distribution across different acoustic classes is inevitable, which leads to inter-class mis-classification. The ASR accuracy is expected to improve if the conventional architecture of acoustic model is modified to make them more suitable to account for such overlaps. MixNet is developed keeping this in mind. Analysis conducted by means of scatter diagram verifies that MoE indeed improves the separation between classes that translates to better ASR accuracy. Experiments are conducted on a large vocabulary ASR task which show that the proposed architecture provides 13.6 and 10.0 models, namely, DNN and LSTM respectively, trained using sMBR criteria. In comparison to an existing method developed for phone-classification (by Eigen et al), our proposed method yields a significant improvement.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/02/2020

DNN-Based Semantic Model for Rescoring N-best Speech Recognition List

The word error rate (WER) of an automatic speech recognition (ASR) syste...
research
12/02/2021

A higher order Minkowski loss for improved prediction ability of acoustic model in ASR

Conventional automatic speech recognition (ASR) system uses second-order...
research
05/09/2017

Phone-aware Neural Language Identification

Pure acoustic neural models, particularly the LSTM-RNN model, have shown...
research
06/16/2021

Topic Classification on Spoken Documents Using Deep Acoustic and Linguistic Features

Topic classification systems on spoken documents usually consist of two ...
research
08/29/2017

Information Theoretic Analysis of DNN-HMM Acoustic Modeling

We propose an information theoretic framework for quantitative assessmen...
research
05/25/2022

Heterogeneous Reservoir Computing Models for Persian Speech Recognition

Over the last decade, deep-learning methods have been gradually incorpor...
research
07/19/2017

Dynamic Layer Normalization for Adaptive Neural Acoustic Modeling in Speech Recognition

Layer normalization is a recently introduced technique for normalizing t...

Please sign up or login with your details

Forgot password? Click here to reset