Exploring TTS without T Using Biologically/Psychologically Motivated Neural Network Modules (ZeroSpeech 2020)

05/11/2020
by   Takashi Morita, et al.
0

In this study, we reported our exploration of Text-To-Speech without Text (TTS without T) in the ZeroSpeech Challenge 2020, in which participants proposed an end-to-end, unsupervised system that learned speech recognition and TTS together. We addressed the challenge using biologically/psychologically motivated modules of Artificial Neural Networks (ANN), with a particular interest in unsupervised learning of human language as a biological/psychological problem. The system first processes Mel Frequency Cepstral Coefficient (MFCC) frames with an Echo-State Network (ESN), and simulates computations in cortical microcircuits. The outcome is discretized by our original Variational Autoencoder (VAE) that implements the Dirichlet-based Bayesian clustering widely accepted in computational linguistics and cognitive science. The discretized signal is then reverted into sound waveform via a neural-network implementation of the source-filter model for speech production.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/14/2022

Learning and controlling the source-filter representation of speech with a variational autoencoder

Understanding and controlling latent representations in deep generative ...
research
09/23/2022

An artificial neural network-based system for detecting machine failures using tiny sound data: A case study

In an effort to advocate the research for a deep learning-based machine ...
research
10/04/2021

Towards efficient end-to-end speech recognition with biologically-inspired neural networks

Automatic speech recognition (ASR) is a capability which enables a progr...
research
05/09/2022

NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality

Text to speech (TTS) has made rapid progress in both academia and indust...
research
10/20/2019

Neuro-SERKET: Development of Integrative Cognitive System through the Composition of Deep Probabilistic Generative Models

This paper describes a framework for the development of an integrative c...
research
06/24/2022

End-to-End Text-to-Speech Based on Latent Representation of Speaking Styles Using Spontaneous Dialogue

The recent text-to-speech (TTS) has achieved quality comparable to that ...
research
12/17/2018

Persian Vowel recognition with MFCC and ANN on PCVC speech dataset

In this paper a new method for recognition of consonant-vowel phonemes c...

Please sign up or login with your details

Forgot password? Click here to reset