On Structured Sparsity of Phonological Posteriors for Linguistic Parsing

01/21/2016
by   Milos Cernak, et al.
0

The speech signal conveys information on different time scales from short time scale or segmental, associated to phonological and phonetic information to long time scale or supra segmental, associated to syllabic and prosodic information. Linguistic and neurocognitive studies recognize the phonological classes at segmental level as the essential and invariant representations used in speech temporal organization. In the context of speech processing, a deep neural network (DNN) is an effective computational method to infer the probability of individual phonological classes from a short segment of speech signal. A vector of all phonological class probabilities is referred to as phonological posterior. There are only very few classes comprising a short term speech signal; hence, the phonological posterior is a sparse vector. Although the phonological posteriors are estimated at segmental level, we claim that they convey supra-segmental information. Specifically, we demonstrate that phonological posteriors are indicative of syllabic and prosodic events. Building on findings from converging linguistic evidence on the gestural model of Articulatory Phonology as well as the neural basis of speech perception, we hypothesize that phonological posteriors convey properties of linguistic classes at multiple time scales, and this information is embedded in their support (index) of active coefficients. To verify this hypothesis, we obtain a binary representation of phonological posteriors at the segmental level which is referred to as first-order sparsity structure; the high-order structures are obtained by the concatenation of first-order binary vectors. It is then confirmed that the classification of supra-segmental linguistic events, the problem known as linguistic parsing, can be achieved with high accuracy using asimple binary pattern matching of first-order or high-order structures.

READ FULL TEXT
research
12/05/1998

A High Quality Text-To-Speech System Composed of Multiple Neural Networks

While neural networks have been employed to handle several different tex...
research
04/12/2021

End-to-End Mandarin Tone Classification with Short Term Context Information

In this paper, we propose an end-to-end Mandarin tone classification met...
research
09/29/2021

Can phones, syllables, and words emerge as side-products of cross-situational audiovisual learning? – A computational investigation

Decades of research has studied how language learning infants learn to d...
research
11/16/2022

A Two-Stage Deep Representation Learning-Based Speech Enhancement Method Using Variational Autoencoder and Adversarial Training

This paper focuses on leveraging deep representation learning (DRL) for ...
research
02/16/2017

Addressing the Data Sparsity Issue in Neural AMR Parsing

Neural attention models have achieved great success in different NLP tas...
research
02/08/2016

LSTM Deep Neural Networks Postfiltering for Improving the Quality of Synthetic Voices

Recent developments in speech synthesis have produced systems capable of...
research
02/27/2018

Deep factorization for speech signal

Various informative factors mixed in speech signals, leading to great di...

Please sign up or login with your details

Forgot password? Click here to reset