End-to-end Phoneme Sequence Recognition using Convolutional Neural Networks

12/07/2013
by   Dimitri Palaz, et al.
0

Most phoneme recognition state-of-the-art systems rely on a classical neural network classifiers, fed with highly tuned features, such as MFCC or PLP features. Recent advances in "deep learning" approaches questioned such systems, but while some attempts were made with simpler features such as spectrograms, state-of-the-art systems still rely on MFCCs. This might be viewed as a kind of failure from deep learning approaches, which are often claimed to have the ability to train with raw signals, alleviating the need of hand-crafted features. In this paper, we investigate a convolutional neural network approach for raw speech signals. While convolutional architectures got tremendous success in computer vision or text processing, they seem to have been let down in the past recent years in the speech processing field. We show that it is possible to learn an end-to-end phoneme sequence classifier system directly from raw signal, with similar performance on the TIMIT and WSJ datasets than existing systems based on MFCC, questioning the need of complex hand-crafted features on large datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/17/2020

End-to-End Neural Event Coreference Resolution

Traditional event coreference systems usually rely on pipeline framework...
research
06/09/2020

End-to-end User Recognition using Touchscreen Biometrics

We study the touchscreen data as behavioural biometrics. The goal was to...
research
10/04/2021

WaveBeat: End-to-end beat and downbeat tracking in the time domain

Deep learning approaches for beat and downbeat tracking have brought adv...
research
07/23/2015

Deep Fishing: Gradient Features from Deep Nets

Convolutional Networks (ConvNets) have recently improved image recogniti...
research
05/03/2021

An End-to-End and Accurate PPG-based Respiratory Rate Estimation Approach Using Cycle Generative Adversarial Networks

Respiratory rate (RR) is a clinical sign representing ventilation. An ab...
research
10/24/2018

Automatic Identification of Indicators of Compromise using Neural-Based Sequence Labelling

Indicators of Compromise (IOCs) are artifacts observed on a network or i...
research
09/11/2023

Energy Preservation and Stability of Random Filterbanks

What makes waveform-based deep learning so hard? Despite numerous attemp...

Please sign up or login with your details

Forgot password? Click here to reset