Blind phoneme segmentation with temporal prediction errors

08/01/2016
by   Paul Michel, et al.
0

Phonemic segmentation of speech is a critical step of speech recognition systems. We propose a novel unsupervised algorithm based on sequence prediction models such as Markov chains and recurrent neural network. Our approach consists in analyzing the error profile of a model trained to predict speech features frame-by-frame. Specifically, we try to learn the dynamics of speech in the MFCC space and hypothesize boundaries from local maxima in the prediction error. We evaluate our system on the TIMIT dataset, with improvements over similar methods.

READ FULL TEXT
research
08/07/2020

Deep Learning Based Dereverberation of Temporal Envelopesfor Robust Speech Recognition

Automatic speech recognition in reverberant conditions is a challenging ...
research
12/15/2021

Speech frame implementation for speech analysis and recognition

Distinctive features of the created speech frame are: the ability to tak...
research
10/25/2016

Sequence Segmentation Using Joint RNN and Structured Prediction Models

We describe and analyze a simple and effective algorithm for sequence se...
research
12/10/2019

A Novel Topology for End-to-end Temporal Classification and Segmentation with Recurrent Neural Network

Connectionist temporal classification (CTC) has matured as an alignment ...
research
07/04/2022

Minimizing Sequential Confusion Error in Speech Command Recognition

Speech command recognition (SCR) has been commonly used on resource cons...
research
09/21/2021

On the Difficulty of Segmenting Words with Attention

Word segmentation, the problem of finding word boundaries in speech, is ...
research
12/23/2018

Unsupervised Speech Recognition via Segmental Empirical Output Distribution Matching

We consider the problem of training speech recognition systems without u...

Please sign up or login with your details

Forgot password? Click here to reset