Recurrent DNNs and its Ensembles on the TIMIT Phone Recognition Task

06/19/2018
by   Jan Vanek, et al.
0

In this paper, we have investigated recurrent deep neural networks (DNNs) in combination with regularization techniques as dropout, zoneout, and regularization post-layer. As a benchmark, we chose the TIMIT phone recognition task due to its popularity and broad availability in the community. It also simulates a low-resource scenario that is helpful in minor languages. Also, we prefer the phone recognition task because it is much more sensitive to an acoustic model quality than a large vocabulary continuous speech recognition task. In recent years, recurrent DNNs pushed the error rates in automatic speech recognition down. But, there was no clear winner in proposed architectures. The dropout was used as the regularization technique in most cases, but combination with other regularization techniques together with model ensembles was omitted. However, just an ensemble of recurrent DNNs performed best and achieved an average phone error rate from 10 experiments 14.84 (minimum 14.69 best-published PER to date, according to our knowledge. Finally, in contrast of the most papers, we published the open-source scripts to easily replicate the results and to help continue the development.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/19/2018

A Survey of Recent DNN Architectures on the TIMIT Phone Recognition Task

In this survey paper, we have evaluated several recent deep neural netwo...
research
07/12/2018

A Comparison of Adaptation Techniques and Recurrent Neural Network Architectures

Recently, recurrent neural networks have become state-of-the-art in acou...
research
06/19/2016

Graph based manifold regularized deep neural networks for automatic speech recognition

Deep neural networks (DNNs) have been successfully applied to a wide var...
research
07/02/2018

Exploring End-to-End Techniques for Low-Resource Speech Recognition

In this work we present simple grapheme-based system for low-resource sp...
research
12/11/2019

Leveraging End-to-End Speech Recognition with Neural Architecture Search

Deep neural networks (DNNs) have been demonstrated to outperform many tr...
research
06/21/2023

Strategies in Transfer Learning for Low-Resource Speech Synthesis: Phone Mapping, Features Input, and Source Language Selection

We compare using a PHOIBLE-based phone mapping method and using phonolog...
research
06/14/2016

Calibration of Phone Likelihoods in Automatic Speech Recognition

In this paper we study the probabilistic properties of the posteriors in...

Please sign up or login with your details

Forgot password? Click here to reset