Learning linearly separable features for speech recognition using convolutional neural networks

12/22/2014
by   Dimitri Palaz, et al.
0

Automatic speech recognition systems usually rely on spectral-based features, such as MFCC of PLP. These features are extracted based on prior knowledge such as, speech perception or/and speech production. Recently, convolutional neural networks have been shown to be able to estimate phoneme conditional probabilities in a completely data-driven manner, i.e. using directly temporal raw speech signal as input. This system was shown to yield similar or better performance than HMM/ANN based system on phoneme recognition task and on large scale continuous speech recognition task, using less parameters. Motivated by these studies, we investigate the use of simple linear classifier in the CNN-based framework. Thus, the network learns linearly separable features from raw speech. We show that such system yields similar or better performance than MLP based system using cepstral-based features as input.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/03/2013

Estimating Phoneme Class Conditional Probabilities from Raw Speech Signal using Convolutional Neural Networks

In hybrid hidden Markov model/artificial neural networks (HMM/ANN) autom...
research
06/17/2019

Speech Recognition With No Speech Or With Noisy Speech Beyond English

In this paper we demonstrate continuous noisy speech recognition using c...
research
04/14/2022

Lombard Effect for Bilingual Speakers in Cantonese and English: importance of spectro-temporal features

For a better understanding of the mechanisms underlying speech perceptio...
research
07/06/2018

Tone Recognition Using Lifters and CTC

In this paper, we present a new method for recognizing tones in continuo...
research
04/06/2021

Learning to Rank Microphones for Distant Speech Recognition

Fully exploiting ad-hoc microphone networks for distant speech recogniti...
research
11/05/2019

Spatial Attention for Far-field Speech Recognition with Deep Beamforming Neural Networks

In this paper, we introduce spatial attention for refining the informati...
research
06/17/2019

Real to H-space Encoder for Speech Recognition

Deep neural networks (DNNs) and more precisely recurrent neural networks...

Please sign up or login with your details

Forgot password? Click here to reset