Estimating Phoneme Class Conditional Probabilities from Raw Speech Signal using Convolutional Neural Networks

04/03/2013
by   Dimitri Palaz, et al.
0

In hybrid hidden Markov model/artificial neural networks (HMM/ANN) automatic speech recognition (ASR) system, the phoneme class conditional probabilities are estimated by first extracting acoustic features from the speech signal based on prior knowledge such as, speech perception or/and speech production knowledge, and, then modeling the acoustic features with an ANN. Recent advances in machine learning techniques, more specifically in the field of image processing and text processing, have shown that such divide and conquer strategy (i.e., separating feature extraction and modeling steps) may not be necessary. Motivated from these studies, in the framework of convolutional neural networks (CNNs), this paper investigates a novel approach, where the input to the ANN is raw speech signal and the output is phoneme class conditional probability estimates. On TIMIT phoneme recognition task, we study different ANN architectures to show the benefit of CNNs and compare the proposed approach against conventional approach where, spectral-based feature MFCC is extracted and modeled by a multilayer perceptron. Our studies show that the proposed approach can yield comparable or better phoneme recognition performance when compared to the conventional approach. It indicates that CNNs can learn features relevant for phoneme classification automatically from the raw speech signal.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/22/2014

Learning linearly separable features for speech recognition using convolutional neural networks

Automatic speech recognition systems usually rely on spectral-based feat...
research
01/10/2017

Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks

Convolutional Neural Networks (CNNs) are effective models for reducing s...
research
02/11/2020

CGCNN: Complex Gabor Convolutional Neural Network on raw speech

Convolutional Neural Networks (CNN) have been used in Automatic Speech R...
research
11/19/2019

Deep Spiking Neural Networks for Large Vocabulary Automatic Speech Recognition

Artificial neural networks (ANN) have become the mainstream acoustic mod...
research
05/02/2023

Basic syntax from speech: Spontaneous concatenation in unsupervised deep neural networks

Computational models of syntax are predominantly text-based. Here we pro...
research
12/17/2018

Persian Vowel recognition with MFCC and ANN on PCVC speech dataset

In this paper a new method for recognition of consonant-vowel phonemes c...
research
01/02/2020

Deep Representation Learning in Speech Processing: Challenges, Recent Advances, and Future Trends

Research on speech processing has traditionally considered the task of d...

Please sign up or login with your details

Forgot password? Click here to reset