Deep Feed-forward Sequential Memory Networks for Speech Synthesis

02/26/2018
by   Mengxiao Bi, et al.
0

The Bidirectional LSTM (BLSTM) RNN based speech synthesis system is among the best parametric Text-to-Speech (TTS) systems in terms of the naturalness of generated speech, especially the naturalness in prosody. However, the model complexity and inference cost of BLSTM prevents its usage in many runtime applications. Meanwhile, Deep Feed-forward Sequential Memory Networks (DFSMN) has shown its consistent out-performance over BLSTM in both word error rate (WER) and the runtime computation cost in speech recognition tasks. Since speech synthesis also requires to model long-term dependencies compared to speech recognition, in this paper, we investigate the Deep-FSMN (DFSMN) in speech synthesis. Both objective and subjective experiments show that, compared with BLSTM TTS method, the DFSMN system can generate synthesized speech with comparable speech quality while drastically reduce model complexity and speech generation time.

READ FULL TEXT
research
04/22/2020

Utterance-level Sequential Modeling For Deep Gaussian Process Based Speech Synthesis Using Simple Recurrent Unit

This paper presents a deep Gaussian process (DGP) model with a recurrent...
research
09/14/2023

Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks

We propose a decoder-only language model, VoxtLM, that can perform four ...
research
07/23/2021

Using Deep Learning Techniques and Inferential Speech Statistics for AI Synthesised Speech Recognition

The recent developments in technology have re-warded us with amazing aud...
research
11/02/2015

Automatic Prosody Prediction for Chinese Speech Synthesis using BLSTM-RNN and Embedding Features

Prosody affects the naturalness and intelligibility of speech. However, ...
research
09/13/2022

Deep Speech Synthesis from Articulatory Representations

In the articulatory synthesis task, speech is synthesized from input fea...
research
04/03/2023

Dual-Attention Neural Transducers for Efficient Wake Word Spotting in Speech Recognition

We present dual-attention neural biasing, an architecture designed to bo...
research
04/12/2019

RNN-based speech synthesis using a continuous sinusoidal model

Recently in statistical parametric speech synthesis, we proposed a conti...

Please sign up or login with your details

Forgot password? Click here to reset