Deep Speech Synthesis from Articulatory Representations

09/13/2022
by   Peter Wu, et al.
0

In the articulatory synthesis task, speech is synthesized from input features containing information about the physical behavior of the human vocal tract. This task provides a promising direction for speech synthesis research, as the articulatory space is compact, smooth, and interpretable. Current works have highlighted the potential for deep learning models to perform articulatory synthesis. However, it remains unclear whether these models can achieve the efficiency and fidelity of the human speech production system. To help bridge this gap, we propose a time-domain articulatory synthesis methodology and demonstrate its efficacy with both electromagnetic articulography (EMA) and synthetic articulatory feature inputs. Our model is computationally efficient and achieves a transcription word error rate (WER) of 18.5 EMA-to-speech task, yielding an improvement of 11.6 Through interpolation experiments, we also highlight the generalizability and interpretability of our approach.

READ FULL TEXT
research
07/23/2021

Using Deep Learning Techniques and Inferential Speech Statistics for AI Synthesised Speech Recognition

The recent developments in technology have re-warded us with amazing aud...
research
10/22/2020

How Similar or Different Is Rakugo Speech Synthesizer to Professional Performers?

We have been working on speech synthesis for rakugo (a traditional Japan...
research
02/26/2018

Deep Feed-forward Sequential Memory Networks for Speech Synthesis

The Bidirectional LSTM (BLSTM) RNN based speech synthesis system is amon...
research
10/21/2020

Grapheme or phoneme? An Analysis of Tacotron's Embedded Representations

End-to-end models, particularly Tacotron-based ones, are currently a pop...
research
01/25/2021

High-Quality Vocoding Design with Signal Processing for Speech Synthesis and Voice Conversion

This Ph.D. thesis focuses on developing a system for high-quality speech...
research
10/18/2022

Spontaneous speech synthesis with linguistic-speech consistency training using pseudo-filled pauses

We propose a training method for spontaneous speech synthesis models tha...
research
11/23/2020

Speech Command Recognition in Computationally Constrained Environments with a Quadratic Self-organized Operational Layer

Automatic classification of speech commands has revolutionized human com...

Please sign up or login with your details

Forgot password? Click here to reset