Towards Structured Deep Neural Network for Automatic Speech Recognition

11/08/2015
by   Yi-Hsiu Liao, et al.
0

In this paper we propose the Structured Deep Neural Network (structured DNN) as a structured and deep learning framework. This approach can learn to find the best structured object (such as a label sequence) given a structured input (such as a vector sequence) by globally considering the mapping relationships between the structures rather than item by item. When automatic speech recognition is viewed as a special case of such a structured learning problem, where we have the acoustic vector sequence as the input and the phoneme label sequence as the output, it becomes possible to comprehensively learn utterance by utterance as a whole, rather than frame by frame. Structured Support Vector Machine (structured SVM) was proposed to perform ASR with structured learning previously, but limited by the linear nature of SVM. Here we propose structured DNN to use nonlinear transformations in multi-layers as a structured and deep learning approach. This approach was shown to beat structured SVM in preliminary experiments on TIMIT.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/29/2022

Improving Generalization of Deep Neural Network Acoustic Models with Length Perturbation and N-best Based Label Smoothing

We introduce two techniques, length perturbation and n-best based label ...
research
05/21/2020

Large scale evaluation of importance maps in automatic speech recognition

In this paper, we propose a metric that we call the structured saliency ...
research
01/27/2020

Submodular Rank Aggregation on Score-based Permutations for Distributed Automatic Speech Recognition

Distributed automatic speech recognition (ASR) requires to aggregate out...
research
07/27/2022

Subword Dictionary Learning and Segmentation Techniques for Automatic Speech Recognition in Tamil and Kannada

We present automatic speech recognition (ASR) systems for Tamil and Kann...
research
01/10/2017

Multi-task Learning Of Deep Neural Networks For Audio Visual Automatic Speech Recognition

Multi-task learning (MTL) involves the simultaneous training of two or m...
research
04/13/2018

Language Recognition using Time Delay Deep Neural Network

This work explores the use of a monolingual Deep Neural Network (DNN) mo...
research
02/16/2021

Deep Learning based Multi-Source Localization with Source Splitting and its Effectiveness in Multi-Talker Speech Recognition

Multi-source localization is an important and challenging technique for ...

Please sign up or login with your details

Forgot password? Click here to reset