Simon King

research

∙ 06/02/2023

Differentiable Grey-box Modelling of Phaser Effects using Frame-based Spectral Processing

Machine learning approaches to modelling analog audio effects have seen ...

0 Alistair Carson, et al. ∙

research

∙ 05/17/2023

Using a Large Language Model to Control Speaking Style for Expressive TTS

Appropriate prosody is critical for successful spoken communication. Con...

0 Atli Thor Sigurgeirsson, et al. ∙

research

∙ 03/07/2023

Do Prosody Transfer Models Transfer Prosody?

Some recent models for Text-to-Speech synthesis aim to transfer the pros...

0 Atli Thor Sigurgeirsson, et al. ∙

research

∙ 11/13/2022

Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing

Most state-of-the-art Text-to-Speech systems use the mel-spectrogram as ...

0 Jacob J Webber, et al. ∙

research

∙ 06/15/2021

Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis

Text does not fully specify the spoken form, so text-to-speech models mu...

0 Devang S Ram Mohan, et al. ∙

research

∙ 12/07/2020

Using previous acoustic context to improve Text-to-Speech synthesis

Many speech synthesis datasets, especially those derived from audiobooks...

0 Pilar Oplustil-Gallegos, et al. ∙

research

∙ 08/09/2020

An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep Learning

Speaker identity is one of the important characteristics of human speech...

0 Berrak Sisman, et al. ∙

research

∙ 03/14/2020

Perception of prosodic variation for speech synthesis using an unsupervised discrete representation of F0

In English, prosody adds a broad range of information to segment sequenc...

0 Zack Hodari, et al. ∙

research

∙ 02/28/2020

Comparison of Speech Representations for Automatic Quality Estimation in Multi-Speaker Text-to-Speech Synthesis

We aim to characterize how different speakers contribute to the perceive...

0 Jennifer Williams, et al. ∙

research

∙ 06/10/2019

Using generative modelling to produce varied intonation for speech synthesis

Unlike human speakers, typical text-to-speech (TTS) systems are unable t...

0 Zack Hodari, et al. ∙

research

∙ 10/31/2018

Attentive Filtering Networks for Audio Replay Attack Detection

An attacker may use a variety of techniques to fool an automatic speaker...

0 Cheng-I Lai, et al. ∙

research

∙ 07/28/2018

Analysing Shortcomings of Statistical Parametric Speech Synthesis

Output from statistical parametric speech synthesis (SPSS) remains notic...

0 Gustav Eje Henter, et al. ∙

research

∙ 03/23/2018

Exploring the robustness of features and enhancement on speech recognition systems in highly-reverberant real environments

This paper evaluates the robustness of a DNN-HMM-based speech recognitio...

0 José Novoa, et al. ∙

research

∙ 08/22/2016

Median-Based Generation of Synthetic Speech Durations using a Non-Parametric Approach

This paper proposes a new approach to duration modelling for statistical...

0 Srikanth Ronanki, et al. ∙

research

∙ 08/18/2016

DNN-based Speech Synthesis for Indian Languages from ASCII text

Text-to-Speech synthesis in Indian languages has a seen lot of progress ...

0 Srikanth Ronanki, et al. ∙

research

∙ 02/22/2016

Improving Trajectory Modelling for DNN-based Speech Synthesis by using Stacked Bottleneck Features and Minimum Generation Error Training

We propose two novel techniques --- stacking bottleneck features and min...

0 Zhizheng Wu, et al. ∙

research

∙ 01/11/2016

Investigating gated recurrent neural networks for speech synthesis

Recently, recurrent neural networks (RNNs) as powerful sequence models h...

0 Zhizheng Wu, et al. ∙

Simon King

Featured Co-authors

Sign in with Google

Consider DeepAI Pro